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Abstract 

Concatenation  is  a  method  of  building  long  codes  out  of  shorter  ones;  it  attempts  to 
meet  the  problem  of  decoding  complexity  by  breaking  the  required  computation  into 
manageable  segments.  We  present  theoretical  and  computational  results  bearing  on  the 
efficiency  and  complexity  of  concatenated  codes;  the  major  theoretical  results  are  the 
following: 

1.  Concatenation  of  an  arbitrarily  large  nu.nber  of  codes  can  yield  a  probability  of 
error  that  decreases  exponentially  with  the  over-all  block  length,  while  the  decoding 
complexity  increases  only  algebraically;  and 

2,  Concatenation  of  a  finite  number  of  codts  yields  an  error  exponent  that  is  infe¬ 
rior  to  that  attainable  with  a  single  stage,  but  is  nonzero  at  all  rates  below  capaciuy. 

Computations  support  these  theoretical  results,  and  also  give  insight  into  the  rela¬ 
tionship  between  modulation  and  coding. 

This  approach  illuminates  the  special  power  and  usefulness  of  the  class  of  Reed- 
Solomon  codes.  We  give  an  original  presentation  of  their  structure  and  properties, 
from  which  we  derive  the  properties  of  all  BCH  codes;  we  determine  their  weight  dis¬ 
tribution,  and  consider  in  detail  the  implementation  of  their  decoding  algorithm,  which 
we  have  extended  to  correct  both  erasures  and  errors  and  have  othervdse  improved. 
We  show  that  on  a  particularly  suitable  channel,  RS  codes  can  achieve  the  performance 
specified  by  the  coding  theorem. 

Finally,  we  present  a  generalization  of  the  use  of  erasures  in  minimum-distance 
decoding,  and  discuss  the  appropriate  decoding  techniques,  which  constitute  an  inter¬ 
esting  hybrid  between  decoding  and  detection. 
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I.  INTRODUCTION 


It  is  almost  twenty  years  since  Shannon^  annoimced  the  coding  theorem.  The  prom¬ 
ise  of  that  theorem  was  great:  a  probability  of  error  exponentially  small  in  the  block 
length  at  any  information  rate  below  channel  capacity.  Finding  a  way  of  implementmg 
even  moderately  long  codes,  however,  proved  much  more  difficult  than  was  imagined  at 
first.  Only  recently,  in  fact,  have  there  been  invented  codes  and  decoding  methods 

powerful  enough  to  improve  communication  system  performance  significzintly  yet  simple 
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enough  to  be  attractive  to  build. 

The  work  described  here  is  an  approach  to  the  problem  of  coding  and  decoding  com¬ 
plexity.  It  is  based  on  the  premise  that  we  may  not  mind  using  codes  from  10  to  100 
times  longer  than  the  coding  theorem  proves  to  be  sufficient,  if,  by  so  doing,  we  arrive 
at  a  code  that  we  can  implement.  The  idea  is  basically  that  used  in  designing  any  large 
system:  break  the  system  down  into  subsystems  of  a  size  that  can  be  handled,  which 
can  be  joined  together  to  perform  the  fimctions  of  the  large  system.  A  system  so 
designed  may  be  suboptimal  in  comparison  with  a  single  system  designed  all  of  a  piece, 
but  as  long  as  the  nonoptimalities  are  not  crippling,  the  segmented  approach  may  be  the 
preferred  engineering  solution. 

1.  1  CODING  THEOREM  FOR  DISCRETE  MEMORYLESS  CHANNELS 

The  coding  theorem  is  an  existence  theorem.  It  applies  to  many  types  of  channels, 
but  generally  it  is  similar  to  the  coding  theorem  for  block  codes  on  discrete  memoryless 
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channels,  which  will  now  be  stated  in  its  most  modem  form. 

A  discrete  memoryless  channel  has  I  inputs  x.,  J  outputs  yj,  and  a  characteristic 

transition  probability  matrix  p..  =  Pr(y./x.).  On  each  use  of  the  channel,  one  of  the 

3  ^  3 

inputs  X.  is  selected  by  the  transmitter.  The  conditional  probability  that  the  receiver 
then  observes  the  output  y^  is  pj^;  the  memorylessness  of  the  channel  implies  that  these 
probabilities  are  the  same  for  each  transmission,  regardless  of  what  happened  on  any 
other  transmission.  A  code  word  of  length  N  for  such  a  channel  then  consists  of  a 
sequence  of  N  symbols,  each  of  which  comes  from  an  I-symbol  alphabet  and  denotes  one 
of  the  I  channel  inputs;  upon  the  transmission  of  such  a  word,  a  received  word  of  length 
N  becomes  available  to  the  receiver,  where  now  the  received  symbols  are  from  a 

J -symbol  alphabet  and  correspond  to  the  channel  outputs.  A  block  code  of  length  N  and 

NR  NR  N 

rate  R  (nats)  consists  of  e  code  words  of  length  N.  Clearly  e  <  I  ;  sometimes  we 

rN  NR 

shall  use  the  dimensionless  rate  r,  0  <  r  <  1,  defined  by  I  =  e  or  R  =  r  in  I. 

NR 

The  problem  of  the  receiver  is  generally  to  decide  which  of  the  e  code  words  was 
sent,  given  the  received  word;  a  v/rong  choice  we  call  an  error.  We  shall  assume  that 
all  code  words  are  equally  likely;  then  the  optimal  strategy  for  the  receiver  iu  principle, 
though  rarely  feasible,  is  to  compute  the  probability  of  getting  the  received  word, 
given  each  code  word,  and  to  choose  that  code  word  for  which  this  probability  is  great¬ 
est;  this  strategy  is  called  maximum-likelihood  decoding.  The  coding  theorem  then 
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asserts  that  there  exists  a  block  code  of  length  N  and  rate  R  such  that  with  maximum- 
likelihood  decoding  the  probability  of  decoding  error  is  bounded  by 

Pr(e)  s  e-NE<R>. 

where  E(R),  the  error  exponent,  is  characteristic  of  the  channel,  and  is  positive  for  all 
rates  less  than  C,  called  capacity. 


Fig.  1.  E(R)  curve  for  BSC  with  p  =  .  01. 


Figure  1  shows  the  error  exponent  for  the  binary  symmetric  channel  whose  cross¬ 
over  probability  is  .  01  —  that  is,  the  discrete  memoryless  channel  with  transition  prob¬ 
ability  matrix  p^^  =  ~  Pl2  ~  P21  ”  typical,  this  curve  has  three 

segments:  two  convex  curves  joined  by  a  straight-line  segment  of  slope  -1.  Gallager^ 
has  shown  that  the  high-rate  curved  segment  and  the  straight-line  part  of  the  error 
exponent  are  given  by 
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where 


E(R)  =  max  {e  (P,p)-pR} 
0<p<l 

P 


Eo(P.P)  = 


J 

-I 


i=l 


1 

I 


ii=i 


p.p. 
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P  being  any  I-dimensional  vector  of  probabilities  P.;  this  is  called  the  unexpurgated 
error  exponent,  in  deference  to  the  fact  that  a  certain  purge  of  po<3r  code  words  is 
involved  in  the  argument  which  yields  the  low-rate  cu3*ved  segment,  or  expurgated  error 
exponent.  An  analogous  formula  exists  for  the  exponent  when  the  inputs  and  outputs  form 
continuous  rather  than  discrete  sets.  It  should  be  mentioned  that  a  lower  bound  to  Pr(e) 
is  known  which  shows  that  in  the  range  of  the  high-rate  curved  segment,  this  exponent 

—WE 

is  the  true  one,  in  the  sense  that  there  is  no  code  which  can  attain  Pr(e)  ^  e  '  '  for 
E*(R)  >  E(R)  and  N  arbitrarily  large. 

Thus  for  ai^  rate  less  than  capacity,  the  probability  of  error  can  be  made  to 

decrease  exponentially  with  the  block  length.  The  deficiencies  of  the  coding  theorem  are 

that  it  does  not  specify  a  particular  code  that  achieves  this  performance,  nor  does  it 

offer  an  attractive  decoding  method.  The  former  deficiency  is  not  grave,  since  the  rel- 
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atively  easily  implemented  classes  of  linear  codes  and  convolutional  codes'  contain 
members  satisfying  the  coding  theorem.  It  has  largely  been  the  decoding  problem  that 
has  stymied  the  application  of  codes  to  real  systems,  and  it  is  this  problem  which  con¬ 
catenation  attempts  to  meet. 


1.  2  CONCATENATION  APPROACH 

The  idea  behind  concatenated  codes  is  simple.  Suppose  we  set  up  a  coder  and 
decoder  for  some  channel;  then  the  coder-channel-decoder  chain  can  be  considered  from 
the  outside  as  a  superchaimel  with  exp  NR  inputs  {the  code  words),  exp  NR  outputs  (the 
decoder's  guesses),  smd  a  transition  probability  matrix  characterized  by  a  high  proba¬ 
bility  of  getting  the  output  corresponding  to  the  correct  input.  If  the  original  channel  is 
memoryless,  the  superchannel  must  be  also,  if  the  code  is  not  changed  from  block  to 

block.  It  is  now  reasonable  to  think  of  designing  a  code  for  the  superchaxmel  of  length  n, 

NR 

dimensionless  rate  r,  and  with  symbols  from  an  e  -symbol  alphabet.  This  done, 

we  can  abandon  the  fiction  of  the  superchannel,  and  observe  that  we  have  created  a  code 

i  FTR  Nr 

for  the  original  channel  of  length  nN,  with  (e  )  code  words,  and  therefore  rate  rR 
(nats).  These  ideas  are  illustrated  in  Fig.  2,  where  the  two  codes  and  their  associated 
coders  and  decoders  are  labelled  iimer  and  outer,  respectively. 

By  concatenating  codes,  we  can  achieve  very  long  codes,  capable  of  being  decoded 
by  two  decoders  suited  to  much  shorter  codes.  We  thus  realize  considerable  savings  in 
complexity,  but  at  some  sacrifice  in  performance.  In  Section  V  we  shall  find  that  this 
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Fig.  2.  Illustrating  concatenation. 


sacrifice  comes  in  the  magnitude  of  the  attainable  error  exponent;  however,  we  find  that 
the  attainable  probability  of  error  still  decreases  exponentially  with  block  length  for  all 
rates  less  than  capacity. 

The  outer  code  will  always  be  one  of  a  class  of  nonbinary  BCH  codes  called  Reed- 
Solomon  codes,  first  because  these  are  the  only  general  nonbinaiy  codes  known,  and 
second,  because  they  can  be  implemented  relatively  easily,  both  for  coding  and  for 
decoding.  But  furthermore,  we  discover  in  Section  V  that  under  certain  convenient 
suppositions  about  the  superchannel,  these  codes  are  capable  of  matching  the  per¬ 
formance  of  the  coding  theorem.  Because  of  their  remarkable  suitability  for  our 
application,  we  devote  considerable  time  in  Section  III  to  development  of  their  struc¬ 
ture  and  properties,  and  in  Section  IV  to  the  detailed  exposition  of  their  decoding 
algorithm. 

1.3  MODULATION 

The  functions  of  any  data  terminal  are  commonly  performed  by  a  concatenation  of 
devices;  for  example,  a  transmitting  station  might  consist  of  an  analog-to-digital  con¬ 
verter,  a  coder,  a  modulator,  and  an  antenna.  Coding  theory  is  normally  concerned 
only  with  the  coding  stage,  which  typically  accepts  a  stream  of  bits  and  delivers  to  the 
modulator  a  coded  stream  of  symbols.  Up  to  this  point,  only  the  efficient  design  of  this 
stage  has  been  considered,  and  in  the  sequel  this  concentration  will  largely  continue, 
since  this  problem  is  most  susceptible  to  analytical  treatment. 

^  n  raw  channel,  we  mean  whatever  of  the  physical  channel  and  associated  terminal 
equipment  are  beyond  our  design  control.  It  may  happen  that  the  channel  already  exists 
in  such  a  form,  say,  with  a  certain  kind  of  repeater,  that  it  must  be  fed  binary  symbols, 
and  in  this  case  the  raw  channel  is  discrete.  Sometimes,  however,  we  have  more  free¬ 
dom  to  choose  the  types  of  signals,  the  amount  of  bandwidth,  or  the  amount  of  diversity 
to  be  used,  and  we  must  properly  consider  these  questions  together  with  coding  to  arrive 
at  the  most  effective  and  economical  signal  design. 

When  we  are  thus  free  to  select  some  parameters  of  the  channel,  the  channel  con¬ 
templated  by  algebraic  coding  theory,  which,  for  one  thing,  has  a  fixed  number  of  inputs 
and  outputs,  is  no  longer  a  useful  model.  A  more  general  approach  to  communication 
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theory,  usueiUy  described  under  the  headings  modulation  theory,  signal  design,  and 
detection  theory,  is  then  appropriate.  Few  general  theoretical  results  are  obtainable 
in  these  disciplines,  which  must  largely  be  content  with  analyzing  the  performance  of 
various  interesting  systems.  Section  VI  reports  the  results  of  a  computational  search 
for  coding  schemes  meeting  certain  standards  of  performance,  where  both  discrete  raw 
channels  and  channels  permitting  some  choice  of  modulation  are  considered.  This  gives 
iionsiderable  insight  into  the  relationship  between  modulation  and  coding.  In  particular 
it  is  sho’ra  that  nonbinary  modulation  with  relatively  simple  codes  can  be  strikingly 
superior  either  to  complicated  modulation  with  no  coding,  or  to  binary  modulation  with 
complicated  binary  codes. 

1.4  CHANNELS  WITH  MEMORY 

Another  reason  for  the  infrequent  use  of  codes  in  real  communication  systems  has 
been  that  real  channels  are  usually  not  laemoryless.  Typically,  a  channel  will  have  long 
periods  in  which  it  is  good,  causing  only  scattered  random  errors,  separated  by  short 
bad  periods  or  bursts  of  noise.  Statistical  fluctuations  having  such  an  appearance  will 
be  observed  even  on  a  memoryless  channel;  the  requirement  of  long  codes  imposed  by 
the  coding  theorem  may  be  interpreted  as  insuring  that  the  channel  be  used  for  enough 
transmissions  that  the  probability  of  a  statistical  fluctuation  bad  enough  to  cause  an 
error  is  very  small  indeed.  The  coding  theorem  can  be  extended  to  channels  with  mem¬ 
ory,  but  now  the  block  lengths  must  generally  be  very  much  longer,  so  that  the  channel 
has  time  to  run  through  all  its  tricks  in  a  block  length. 

If  a  return  channel  from  the  receiver  to  the  transmitter  is  available,  it  may  be  used 
to  adapt  the  coding  scheme  at  the  transmitter  to  the  type  of  noise  currently  being 
observed  at  the  receiver,  or  to  request  retransmission  of  blocks  which  the  receiver 
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cannot  decode.  Without  such  a  feedback  channel,  if  the  loss  of  information  during 
bursts  is  unacceptable,  some  varicint  of  a  technique  called  interlacing  is  usually  envi¬ 
sioned.^®  In  interlacing,  the  coder  codes  n  blocks  of  length  N  at  once,  and  then  trans- 
mits  the  n  first  symbols,  the  n  second  symbols,  and  so  forth  through  the  n  N 
symbols.  At  the  receiver  the  blocks  are  imscrambled  and  decoded  individually.  It  is 
clear  that  a  burst  of  length  b  <  n  can  affect  no  more  than  one  symbol  in  any  block,  so 
that  if  the  memory  time  of  the  channel  is  of  the  order  of  n  or  less  the  received  block 
of  iiN  symbols  will  generally  be  decodable. 

Concatenation  obviously  shares  the  burst-resistant  properties  of  interlacing  when 
the  memory  time  of  the  channel  is  of  the  order  of  the  inner  code  block  length  or  less, 
for  a  burst  then  will  usually  affect  no  more  than  one  or  two  symbols  in  the  outer  code, 
which  will  generally  be  quite  correctable.  Because  of  the  difficulty  of  constructing  ade¬ 
quate  models  of  real  channels  with  memory,  it  is  difficult  to  pursue  analysis  of  the 
burst  resistance  of  concatenated  codes,  but  it  may  be  anticipeited  that  this  feature  will 
prove  useful  in  real  applications. 


5 


I.  5  CONa4TENATING  CONVOLUTIONAL  CODES 

V/e  gball  creisider  osiiy  block  codes  henceforth.  The  principles  oi  concatenation  are 
clearly  ajralicable  to  any  tjrpe  of  code.  For  example,  a  simple  commlutiocal  code  with 
fhre?ho1^  decoding  is  capable  of  correcting  scattered  random  errors,  but  when  channel 
errors  are  too  ti^tly  bunched  the  decoder  is  thrown  cii  stride  for  awhile,  and  imtil  it 
becomes  resynchronized  causes  a  great  many  decoding  errors.  Frcm  the  outside,  sudi 
a  channel  appears  to  be  an  ideal  bursty  channel,  in  which  errors  do  not  occur  at  all 
except  in  the  well-defined  bursts.  Very  efficient  codes  are  known  for  such  channels, 
ami  could  be  used  as  outer  cedes.  The  reader  wrill  no  doubt  be  able  to  conceive  of  other 
applications. 

J. 6  OUTLINE 

This  report  consists  of  largely  self-sufficient  sections,  with  two  appendices.  We 
anticipate  that  many  readers  will  find  that  the  material  is  arranged  rou^dy  in  inverse 
order  of  interest.  Therefore,  we  shall  outline  the  substance  of  each  sectiem  and  the  cmi- 
nections  between  them. 

Section  n  begins  with  an  elaborate  presentation  of  the  cmicepts  of  minimum-distance 
decoding,  which  has  two  purposes:  to  acquaint  the  reader  with  the  substance  and  utility 
of  these  concepts,  and  to  lay  the  groimdword  for  a  generalization  of  the  use  of  erasures 
in  minimum- distance  decoding.  Though  this  generalization  is  an  interesting  itybrid 
between  the  techniques  of  detection  and  of  decoding,  it  is  not  used  subsequently. 

Section  III  is  an  attempt  to  provide  a  fast,  direct  route  for  the  reader  of  little  back¬ 
ground  to  an  understanding  of  BCH  codes  and  their  properties.  Emphasis  is  placed  on 
the  important  nonbinaiy  Reed-Solomon  codes.  Though  the  presentation  is  novel,  the  only 
new  resvdts  concern  the  weight  distribution  of  RS  codes  and  the  implementation  of  much 
shortened  RS  codes. 

Section  IV  reports  an  extension  of  the  Gorenstfcin-Zierler  error-correcting  algorithm 
for  ECH  codes  so  that  both  t  rasures  and  errors  can  be  simultaneously  corrected.  Aisc, 
the  final  step  in  the  GZ  algorithm  is  substantially  simplified.  A  close  analysis  of  the 
complexity  of  implementing  this  algorithm  witli  a  computer  concludes  this  section,  and 
only  the  results  of  this  analysis  are  used  in  the  last  two  sections.  Appendix  A  contains 
variants  on  this  decoding  algorithm  of  more  restricted  interest. 

Section  V  contains  our  major  theoretical  .esults  on  the  efficiency  and  complexity  of 
concatenated  codes,  and  Section  VI  reports  the  results  of  a  computational  program  eval¬ 
uating  the  performance  cf  concatenated  codes  under  a  variety  of  specifications.  The 
reader  interested  chiefly  in  the  theoretical  and  practical  properties  of  these  codes  will 
turn  his  attention  first  to  Sections  V  and  VI.  Appendix  B  develops  the  forinulas  used  in 
the  computational  program  of  Section  VI. 
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It 


n.  MINIMUM-DISTANCE  DECODING 


f 


We  introdnce  here  the  concepts  of  distance  and  minimum-distance  codes,  and  discuss 
how  these  cmicepfts  simplify  decoding.  We  describe  the  use  erasures,  axul  of  a  new 
generalization  of  erasures.  Using  the  ChemoU  bound,  we  discover  the  parameters  of 
these  schemes  which  maximize  the  probability  of  correct  decoding:  using  the  Gilbert 
bound,  we  compute  the  expcsient  of  this  probabilify  for  eacdi  of  three  minimum-distance 
decoding  sdiemes  over  a  few  simple  channels. 

2. 1  ERRORS-ONLY  DECODING 


In  Section  I  we  described  how  an  inner  code  of  length  N  and  rate  R  could  be  concat¬ 
enated  with  an  outer  code  of  length  n  and  dimensionless  rate  r  to  yield  a  code  of  over¬ 
all  length  nN  and  irate  rR  for  some  raw  channel.  Suppose  new  one  of  the  e**^*^^  words 
of  this  code  is  selected  at  random  and  transmitted  —  how  do  we  decode  what  is  received? 

The  optimum  decoding  rule  remains  what  it  always  is  when  inputs  are  equalfy  likely: 
Uie  maximum-likelihood  decoding  role.  In  this  case,  given  a  received  sequence  "r  of 
length  zN,  the  rule  would  be  to  compute  Pr(?|f)  for  each  of  the  code  words  f. 

The  whole  point  of  concatenation,  however,  is  to  break  the  decoding  process  into 
manageable  s^ments,  at  the  price  of  suboptimalify.  The  basic  simplification  made  pos¬ 
sible  by  the  concatenated  structure  of  the  code  is  that  the  inner  decoder  cam  decode 
(make  a  hard  decision  on)  each  received  N-^srmbol  sequence  independently.  In  doing  so, 

it  is  in  effect  discarding  all  information  about  the  received  N-symbol  block  except  which 
NH 

of  the  e"  inner  code  words  was  most  likely,  given  that  block.  This  preliminary  proc¬ 
essing  enormously  simplifies  fee  task  of  fee  cuter  decoder,  whitfe  is  to  make  a  final 

choice  of  <Hie  of  the  e°^^^  total  code  words. 

NR 

Let  q  =  e  .  When  me  inner  decoder  makes  a  bard  decision,  the  outer  coder  and 
decoder  see  effectively  a  q-input,  q-output  supsrchannel.  We  assume  that  the  raw  chan¬ 
nel  and  thus  the  superchannel  are  memoryless.  By  a  symbol  error  we  shall  mean  the 
event  in  which  ai^  output  but  the  one  corresponding  to  the  input  actually  transmitted  is 
received.  Normally,  the  probabilify  of  symbol  error  is  low;  it  is  then  convenient  to 
assume  that  all  mcorrect  transmissiens  are  equally  probable  —  that  is,  to  assume  tliat 
the  transition  probability  matrix  of  the  supercbannel  is 


P 


1  -P. 


i  j 


i  =  j 


(1) 


where  p  is  the  probability  of  decoding  error  in  the  inner  decoder,  hence  of  symbol  error 
in  the  supercbannel.  We  call  a  channel  with  such  a  transition  probability  matrix  an  ideal 
superchannel  with  q  inputs  and  probability  of  error  p. 

Recall  that  the  maximum-likelihood  rule,  given  r,  is  to  choose  the  input 
sequence  f  for  which  the  probability  of  receiving  r,  given  f,  is  greatest.  V/hen 
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the  diaimel  is  memoiyless 


Pr(r|f)=  n  Pr(r.jf.). 

i=l  -  ^ 


Bat  since  log  x  is  a  moaiot<Hiic  functicsi  of  x,  this  is  equivalent  to  maximizing 

n 


log  n  Pr(rJfj)  =  ^  log  PrfrJfj). 

1*“1  • _ m 


(2) 


i=l 

Now  for  an  ideal  superchannel,  substituting  Eos.  1  in  Eq.  Z,  we  want  to  maximize 
n 


(3) 


i=l 


where 


a‘(r.,f.)  =  ^ 


1<^  (i-p),  r-  =  f. 


Define  the  Hamming  weigitt  a(r.,f.)  by 
"o,  r.  =  f. 


a{r..f.)  =  < 


1,  r.  *  i.. 

’  1  1 


(4) 


Since 


a'(rj,fj)  =  log  (1-p)  +  |iog 
maxiniizing  Eq.  3  is  equivalent  to  mi  ''imizing 

P  ”1 

n log  (1-p)  +  log  ,q.i,(i-p,  X  a(i  fj). 

I-  J  i=l 

Under  the  assumption  p/(q-l)  <  (l-p),  this  is  equivalent  to  minimizing 
n 

djj(r,f)=  ^  a(r.,fj). 


(5) 


i=l 
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djj(r,f)  is  called  the  Hamming  distance  between  r  and  f,  and  is  simply  the  number  of 
places  in  which  they  differ.  For  an  ideal  superchannel,  the  maximum-likelihood 
decoding  rule  is  therefore  to  choose  that  code  word  which  is  closest  to  the  received  word 
in  Hamming  distar.ce. 
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Al&ou^  &is  distance  has  been  defined  betnreen  a  received  ward  and  a  code  word, 
there  is  no  difficult  in  extending  the  definition  to  apply  between  any  two  code  words.  We 
men  define  minimum  distance  cf  a  code  as  the  minimum  Hamming  distance  between 
any  two  words  in  hie  code. 

A  code  with  large  minimum  distance  is  desirable  on  two  counts.  First,  as  we  shall 
now  show,  it  insures  that  all  cambinati<ms  of  less  &an  or  equal  to  a  certain  number  t 
of  symbol  errors  in  n  uses  of  the  channel  will  be  correctable.  For,  suppose  f  is  sent 
and  t  symbol  errors  occur,  so  that  r.  ^  fj  in  t  places.  Then  from  Eq.  5 

djj{r,f)  =  f.  (6) 


Take  smne  ofiier  code  word  g.  We  separate  hie  places  into  three  disjoint  sets,  such  that 


i  e  <S 


c 


if  f .  ^  g-  and  r. 
1  **1  1 

if  f .  =  g.  and  r. 
1  **1  1 


(7) 


We  note  that  the  set  S  can  have  no  more  than  t  elements.  Now  the  distance  between  r 

e 

and  g. 


n 


djj(r,g)=  >  a(r.,g.) 


i=l 


I 

ieS. 


a(rj,gj). 


(8) 


can  be  lower-bounded  by  use  of  the  relations 
air^.gj)  a(g.,f.)  =  0,  i  £ 

a(ri,g.)  =  a(g.,f.)  =1,  ^  ^ 

a(rj,g-)  5^  a(g.,fj)  -1  =  0,  i  € 


(9) 


Here,  besides  Eqs.  7,  we  have  used  a  ^0  and  the  fact  that  for  i  €  S^,  r.  ^  g..  Substi¬ 
tuting  (9)  in  (8)  yields 


d^(?.g)^d^(g,t)-  ISgl  ^d-t.  (10) 

Heie,  we  have  defined  |s  1  as  the  number  of  elements  in  S  and  used  the  fact  that 
djj(g,^)  >  d  if  g  and  f  are  different  words  in  a  code  with  minimum  distance  d.  By  com¬ 
bining  (6)  and  (10)  we  have  proved  that 


djj(r,f)  <  djj(r,g) 


if  2t  <  d. 


(11) 
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In  other  words,  if  is  Ifae  largest  int^er  such  that  2t^  <  d,  it  is  impossible  for  ai^r 

combinatioci  of  t  or  fewer  swmbol  errors  to  cause  the  received  word  to  be  closer  to  any 
o 

other  code  word  than  to  the  sent  word.  Therefore  no  decoding  error  will  occur. 

Another  virtue  of  a  large  Tninimum  distance  follows  fr<Kn  reinterpretiEg  the  argu¬ 
ment  above.  Suppose  we  hypothesize  the  transmissicui  <E  a  particular  code  word;  given 
the  received  word,  ^lis  Iqrpoihesis  implies  the  occurrence  of  a  particular  sequence  of 
errors.  If  this  sequence  is  such  that  the  Hamming  distance  criterion  of  Eq.  1 1  is  sat¬ 
isfied,  then  we  say  ttiat  the  received  word  is  within  the  minimtan  distance  of  that  code 
word.  {This  may  seem  an  unnecessarily  daborate  way  of  expressing  this  concept,  bat, 
as  in  this  whole  development,  we  are  taking  great  pains  now  so  that  the  generalizations 
of  the  next  two  sections  will  follow  easily.}  Furthermore,  the  preceding  argument  shows 
dial  diere  ran  be  no  more  ihan  cme  code  word  within  the  minimum  distance  of  the 
received  word.  Therefore,  if  ig'  some  means  the  decoder  generates  a  cade  word  that 
it  discovers  to  be  widiin  the  minimum  distance  cf  the  received  word,  it  can  without  fur¬ 
ther  ado  annoimce  that  word  as  its  maximum-likelihood  choice,  since  it  knows  that  it  is 
impossible  that  there  be  any  other  code  word  as  close  or  closer  to  the  received  word. 
This  prcper^  is  the  basis  for  a  number  of  clever  decoding  schemes  proposed 

recentfy,  and  will  be  used  in  the  generalized  minimum-distance  decoding  of  section  2.  3. 

A  final  simplification  tiiat  is  frequently  made  is  to  set  the  outer  decoder  to  decode 
Mjly  -sdien  there  is  a  code  word  within  die  minimum  distance  of  the  received  word.  Such 
a  scheme  we  call  errors-only  decoding.  There  will  course  in  general  be  received 
words  beycmd  the  minimum  distance  from  all  code  words,  and  on  such  words  an  errors- 
only  decoder  will  fail.  Normally,  a  decoding  failure  is  not  distinguished  from  a  decoding 
error,  although  it  is  detectable  while  an  error  is  not. 

2.2  DSLETTONS-AND-ERRORS  DECODING 

The  simplifications  of  the  previous  section  were  bought,  we  recall,  at  the  price  of 
dersying  to  the  outer  decoder  all  information  about  what  the  inner  decoder  received  except 
which  of  the  inner  code  words  was  most  probable,  given  that  recepticai.  In  this  and  the 
following  section  we  investigate  techniques  of  relaying  somev/hat  more  information  to  the 
outer  decoder,  hopefully  without  greatly  complicating  its  task.  These  techniques  are 
generalizations  of  errors-only  decoding,  and  will  be  developed  in  the  framework  that  has 
been  introduced. 

We  continue  to  require  the  inner  decoder  to  make  a  hard  decision  about  which  code 
word  was  sent.  We  now  permit  it  to  send  along  with  its  guess  some  indication  of  how 
reliable  it  considers  its  guess  to  be.  In  the  simplest  such  strategy,  the  inner  decoder 
indicates  either  that  its  guess  is  fully  reliable  or  completely  unreliable;  the  latter  event 
is  called  a  deletion  or  erasure.  The  inner  decoder  normally  would  delete  whenever  the 
evidence  of  the  received  word  did  not  clearly  indicate  which  code  word  was  sent;  also, 
a  decoding  failure,  which  can  occur  in  errors-only  decoding,  would  be  treated  as  a  dele¬ 
tion,  with  some  arbitrary  word  chosen  as  the  guess. 
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In  order  to  make  use  of  this  reliability  information  in  minimum  distance  decoding, 
we  define  the  Elias  weight  by 


0. 


r.  reliable  and  r.  =  f . 

1  i  1 


b(rj,f.)  =  'S  P,  r.  erased 


i  1 


(12) 


r.  reliable  and  r.  f . 


where  p  is  an  arbitrary  number  between  zero  and  one.  Then  fee  Elias  distance 
between  a  received  word  r  and  a  code  word  f  is  defined  as 


16 


n 


{13) 


i=l 


Note  that  Elias  distance  is  not  defined  between  two  code  words. 

V/e  shall  let  our  decoding  rule  be  to  choose  that  code  word  which  is  closest  in  Elias 
distance  to  the  received  word.  Let  us  then  suppose  that  some  word  f  from  a  code  of 
minimum  (Hamming)  distance  d  is  transmitted,  and  in  the  n  transmissions  (i)  s  dele¬ 
tions  occur,  and  (ii)  t  of  the  symbols  classed  as  reliable  are  actually  incorrect.  Then 


dp,(r,f)  =  t  ^  Ps. 

Take  some  other  code  word  g.  We  separate  the  places  into  disjoint  sets  such  that 

's„ 

S  if  f.  #  g-,  r.  =  f.,  r  reliable 

C  1  ”1  111 

S,  if  f.  it  g.,  r.  deleted 

d  1  ^1  1 

S  if  f.  g.,  r.  ii  f.,  r.  reliable 

e  1  °i  111 

■V. 

Note  that 


(-14) 


i  e  < 


(15) 


and 


Sj^t 


S^l^s. 


(16) 


Now  the  distance  between  ‘r  and  g  can  be  lower-bounded  by  the  relations 
l5{r.,g.)  5^a(g.,f.)  =  0,  i  €  S. 

b{r.,g.)  =  a(g.,f.)  =  1.  i 


o 

^c 


b(r.,g.)  •-  a(g.,f.)  -  1  +  p  =  p,  1  e  S. 


1  °i 


'1  1 


b(r.,g.)  5^a{g.,f.)  -  1  =  0,  i  e 


1 1 


(17) 


where  we  have  used  £qs.  12  and  15.  Now 
n 

dgCr.f)  =  ^  l>{rj.g.) 
i=l 

^  X  ^  [a{gj,f.)-l] 

iVs^  i€S^  ieS^  i'is^ 

=  dg{f.S)-(i-P)|s^|  -  Is^l 

^d-{l-p)s-t.  (18) 

■s^ere  we  have  used  Eqs.  13,  16,  17  and  ihe  fact  that  the  minimum  Hamming  distance 
between  two  code  words  is  d.  From  Eqs.  14  and  18,  we  have  proved  that 

d-,(r,g)  >  d„{r,f)  if  t  -r  ps  <  d  -  (1-P)s  -  t  or  2c  +  s  <  d.  (19) 

(The  vanishing  of  p  sho'v^s  why  we  took  it  to  be  arbitrary.)  Thus  with  a  decoding  rule 
based  on  Elias  distance,  we  are  assured  of  decoding  correctly  if  2t  4  s  <  d,  in  perfect 
analogy  to  errors-only  decoding,  \yhen  we  decode  only  out  to  the  minimum  distance  — 
that  is,  when  the  distance  criterion  of  (19)  is  apparently  satisfied  —  we  call  this  dele- 
tions-and-errors  decoding. 

Ihat  erasures  could  be  used  with  minimum  distance  codes  in  this  way  has  long  been 
recognized,  but  few  actual  decoding  schemes  have  been  proposed.  One  of  our  chief  con¬ 
cerns  in  Section  III  will  be  to  develop  a  deletions -and -errors  decoding  algorithm  for  the 
important  class  of  BCH  codes.  There  we  find  that  such  an  algorithm  is  very  little  more 
complicated  than  that  appropriate  to  errors-only  decoding. 


2.3  GENERALIZED  MINIMUM- DISTANCE  DECODING 

A  fuifher  step  in  the  same  direction,  not  previously  investigated,  is  to  permit  the 
inner  decoder  to  classify  its  choice  in  one  of  a  group  of  J  reliability  classes  Cj,  1  ^  j  ^  J, 
rather  than  just  two  as  previously.  We  define  the  generalized  weight  by 


clr.,fj)  = 


P^.,  r.  in  class  C.  and  r.  =  f. 

Cj’  1  J  1  1 

P  r.  in  class  C.  and  r.  *  f. 

6]  1  J  1  1 


(20) 


where  0  <  p^j  ^  P^j  <1.  It  will  develop  that  only  the  difference 


a.  =  p  .  -  p  . 
J  ej  ^cj 


of  these  weights  is  important:  Qj  will  be  called  the  reliability  weight  or  simply  weight 
corresponding  to  class  Cj.  We  have  0  ^  <  1;  a  large  weight  corresponds  to  a  class 

we  consider  quite  reliable,  and  a  small  weight  to  a  class  considered  unreliable;  indeed, 
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if  a.  <  a  we  shall  say  class  C.  is  less  reliable  than  C. .  The  case  a.  =  0  corresponds 

3  K  J  K  ] 

to  an  erasure,  and  of  =  1  to  the  fully  reliable  symbols  of  the  preceding  section. 

Let  us  now  define  a  generalized  distance 


n 

dg(r,f)H  ^  c(r.,L). 


Again  we  suppose  the  transmission  of  some  word  f  from  a  code  of  minimum  distance  d, 

and  the  reception  of  a  word  in  which  n^^  symbols  are  received  correctly  and  placed  in 

dcLSS  C.,  and  n  .  are  received  incorrectly  in  C-  Then 
3  ^3  3 


3=1 


Take  some  other  code  word  g,  and  define  the  sets  S  ,  S  and  S  -  by 

^  o  cj  ej 


S  if  f-  =  g. 

O  1 

i  e  ■<  S  .  if  f.  g.,  r.  =  f.,  r. 

CJ  1  1  l'  1 

S  -  if  f.  g.,  r.  ^  f.,  r. 

ei  1  1  i'  1 

V.  ^ 


in  class  C. 


in  class  C. 


Note  that 


|Scjl*"cj 

|Sejl=*V 

Using  Eqs.  20  and  23,  we  have 
c{r.,gj)  5^a(g.,f.)  =  0, 


i  €  S, 


c{r.,g.)  >a(g.,f.)  -  1  +  p  .  =  P^.,  i  e  S  ., 
'  1  '^1  1  '^cj  cj  ej 


where  the  second  relation  depends  on  r.  =  f.  ^  g.,  i  e  S  ..  Now 

111  Cj 


II 


y  a(g.,t.).  J  r  2_,  (a(g..f.)-Hp^j).  X 
eSo  3=1  [i€S^j  i£S^. 
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J 

dc?.!)  =  dnCf.l)  -  X  [(l-PejllSql^<>-Pci>l®ejl] 

j=l 

J 

*d-^  (26) 

j=l 

Thus,  using  Eqs.  22  and  26,  we  have  proved  that 

J 

d^(r,g)  >  d^{r,f)  if  )  f(l-P  -+P  -)n  t{1-P  .+p  .)n  .]<d, 

j=l 

J 

or  y  [(l““j)n^.j+(l-^c^-)ngj]<  d.  (27) 

j=l 

Therefore  if  generalized  distance  is  used  as  the  decoding  criterion,  no  decoding  error 
will  be  made  whenever  n^j  arid  are  such  that  the  inequality  of  (27)  is  satisfied.  When 
in  addition  we  decode  only  out  to  the  minimum  distance  —  that  is ,  whenever  this  inequal¬ 
ity  is  apparently  satisfied  —  we  say  we  are  doing  generalized  minimum-distance  decoding. 

This  generalization  is  not  interesting  xmless  we  can  exhibit  a  reasonable  decoding 
scheme  that  makes  use  of  this  distance  criterion.  The  theorem  that  appears  below  shows 
that  a  decoder  which  can  perform  deletions -and-errors  decoding  can  be  adapted  to  per¬ 
form  generalized  minimum-distance  decoding. 

V/e  imagine  that  for  the  purpose  of  allowing  a  deletions -and-errors  decoder  to  work 
on  a  received  word,  we  make  a  temporary  assignment  of  the  weight  cj  =  1  to  the  set  of 

reliability  classes  C.  for  which  j  e  R,  say,  and  of  the  weight  a!  =  0  to  the  remaining 

1  3 

reliability  classes  Cj,  j  €  E,  say.  This  means  that  provisionally  all  receptions  in  the 

classes  Cj,  j  e  E,  are  considered  to  be  erased,  and  all  others  to  be  reliable.  We  then 

let  the  deletions -and-errors  decoder  attempt  to  decode  the  resulting  word,  which  it  will 

be  able  to  do  if  (see  Eq.  27) 


2  /  n  .  +  /  (n  .+n  .)  <  d. 
Li  ej  Li  cj  ej ' 

j  eR  j  eE 


(28) 


If  it  succeeds,  it  announces  some  code  word  which  is  within  the  minimxim  distance 
according  to  the  Elias  distance  criterion  of  (28).  We  then  take  this  announced  word  and 
see  whether  it  also  satisfies  the  generalized  distance  criterion  of  (27),  now  with  the 
original  weights  a*.  If  it  does,  then  it  is  the  unique  code  word  within  the  minimum  dis- 

V 

tance  of  the  received  word,  and  can  therefore  be  announced  as  the  choice  of  the  outer 
decoder. 

We  are  not  guaranteed  of  succeeding  with  this  method  for  any  particular  provisional 
assignment  of  the  aj.  The  following  theorem  and  its  corollary  show,  however,  that  a 
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small  number  of  such  tricQs  must  succeed  if  the  received  word  is  within  the  minimum 

distance  accortling  to  the  criterion  of  Eq.  27. 

Let  the  classes  be  ordered  according  to  decreasing  reliability,  so  that  a.  >  a,  if 

1 

j  <  k.  Define  the  J-dimensional  vector 
°  °2’  *  *  *  ’ 

Let  the  sets  R  consist  of  all  i  ^  a,  and  E  ofalli5^a+l,  O^a^J.  Let  a'  be  the 
a  •’  a.  a 

J-dimensional  vector  with  ones  in  the  first  a  places  and  zeros  thereafter,  which  repre¬ 
sents  the  provisional  assignment  of  weights  corresponding  to  R  =  R  and  E  =  E  .  The 
idea  of  the  following  theorem  is  that  'a  is  inside  the  convex  hiill  whose  extreme  points 
are  the  while  the  expression  on  the  left  in  Eq.  27  is  a  linear  function  of  which 
must  take  on  its  minimum  value  over  the  convex  hull  at  some  extreme  point  —  that  is,  at 

one  of  the  provisional  assignments  a' . 

sl 


THEOREM:  If  /  [(l-a.)n  .+(i+a.)n  .]<  d  and  a.  ^  a,  for  j  <  k,  there  is  some 
Lj  3  cj  3  ej  3  K 

3=1 

a  J 

integer  a  such  that  2  ^  n^^  +  I 


3=1 


3=a+l 


Proof;  Let 
J 


f(?)5  ^ 

j=l 


Here,  f  is  clearly  a  linear  function  of  the  J-dimensicnal  vector  a.  Note  that 


f(a;)  =  2 


/  n  .  +  /  (n  .+n  .). 

63  Zy  '  C3  63' 


j=l  3=a+l 

theorem  by  supp 
exhibiting  a  contradiction.  For,  let 


We  prove  the  theorem  by  supposing  tliat  f(a' )  ^d,  for  all  a  such  that  0  <  a  <  J,  and 

ci 


\  =  1  -  a 
O  1 


\  =  a  l<a<J-l 

a  a  a+r 


We  see  that 


0  ^  <  1,  0  <  a  <  J,  and  ^  ^ 

a=0 
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so  that  the  can  be  treated  as  probabilities.  But  now 

J 

a  =  /  \  a* . 

Li  a  a 

a=0 

Therefore 

/j  \  J  J 

m  =  t  I  =  l 

\a=0  /  a=0  a=0 

Thus  if  f(e‘ )  ^  d,  all  a,  then  f(a)  ^  d,  in  contradiction  to  the  given  conditions.  There- 
a 

fore  f{a' )  must  be  less  than  d  for  at  least  one  value  of  a.  Q.  E.  D. 

cL 

The  import  of  the  theorem  is  that  if  there  is  some  code  word  which  satisfies  the 
generalized  distance  criterion  of  Eq.  27,  then  there  must  be  some  provisional  assignment 
in  which  the  least  reliable  classes  are  erased  and  the  rest  are  not  which  will  enable  a 
deletions-and-errors  decoder  to  succeed  in  finding  that  code  word.  But  a  deletions -and- 
errors  decoder  will  succeed  only  if  there  are  apparently  no  errors  and  d  -  1  erasures, 
or  one  error  and  d  -  3  erasures,  and  so  forth  up  to  t^  errors  and  d  -  2t^  -  1  erasures, 
where  t^  is  the  largest  integer  such  that  2t^  <  d  -  1.  If  by  a  trial  we  then  mean  an  oper¬ 
ation  in  which  the  d  -  1  -  2i  least  reliable  symbols  are  erased,  the  resulting  provisional 
word  decoded  by  a  deletions-and-errors  decoder,  and  the  resulting  code  word  (if  the 
decoder  finds  one)  checked  by  Eq.  27,  then  we  have  the  following  corollary. 

COROLLARY:  t^  +  1  <  {d+l)/2  trials  suffice  to  decode  any  received  word  that  is 
within  the  minimum  distance  by  the  generalized  distance  criterion  of  (27),  regardless 
of  hov/  many  reliability  classes  there  are. 

The  maximum  number  of  trials  is  then  proportional  only  to  d.  Furthermore,  many 
of  the  trials  —  perhaps  all  —  may  succeed,  so  that  the  average  number  of  trials  may  be 
appreciably  less  than  the  maximum. 

2.4  PERFORMANCE  OF  MINIMUM-DISTANCE  DECODING  SCHEMES 

Our  primary  objective  now  is  to  develop  exponentially  tight  bounds  on  the  probability 
of  error  achievable  with  the  three  types  of  minimum-distance  decoding  discussed  above, 
and  with  these  bounds  to  compare  the  performance  of  the  three  schemes. 

In  the  course  of  optimizing  these  bounds,  however,  we  shall  discover  how  best  to 
assign  the  weights  to  the  different  reliability  classes.  Since  the  complexity  of  the 
decoder  is  unaffected  by  the  number  of  classes  which  we  recognize,  we  shall  let  each 
distinguishable  N-symbol  sequence  of  outputs  y^  form  a  separate  reliability  class,  and 
let  our  analysis  tell  us  how  to  group  them.  Under  the  assumption,  as  usual,  that  all 
code  words  are  equally  likely,  the  task  of  the  inner  decoder  is  to  assign  to  the  received 
yj  an  x^  and  an  a^,  where  x^  is  the  code  word  x  for  which  Pr(yj 
is  the  reliability  weight  that  we  shall  determine. 


x)  is  greatest,  and  a. 
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i 


a.  The  Chemoff  Bound 

We  shall  require  a  bound  on  the  probability  that  a  sum  of  independent,  identically 
distributed  random  variables  exceeds  a  certain  quantity. 

17 

The  bounding  technique  that  we  use  is  that  of  Chernoff  ;  the  derivation  which  follows 
is  due  to  Gallager.  This  bound  is  known^^  to  be  exponentially  tight,  in  the  sense  that 

— * 

no  bound  of  the  form  Pr(e)  <  e  ,  where  E  is  greater  than  the  Chernoff  bound  expo¬ 
nent,  can  hold  for  arbitrarily  large  n. 

Lety.,  1  ^  i  <  n,  be  n  independent,  identically  distributed  random  variables,  each 
with  moment-generating  function 

g(s)  =  e^^’  =  y  Pr(y)  e®^. 


and  semi- invariant  moment-generating  function 
p(s)  =  In  g{s). 

Define  to  be  the  largest  value  that  y  can  assume,  and 


y  =  y  yPrty) 


Let  Y  be  the  sum  of  the  y.,  and  let  Pr(Y^n6)  be  the  probability  that  Y  exceeds  n6,  where 


y  >  6  5:  y.  Then 
''max  •' 


Pr{Y5n6)  =  ^  Prty^.y^,...  .y^^)  t(yi.y2*  •  •  ♦  'Yn)' 


where 


. 


1,  Y  =  S  y.  ^n5 
i 


0  otherwise. 


Clearly,  for  any  s  ^  0,  we  can  bound  f  by 
f(Yi.Y2»''-*Y„)  ^ 


Then 


Pr(Y>n6) 


_  -n[s6-p(s)] 

—  c  f 


s  >  0, 
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where  we  have  used  &e  fact  that  the  average  of  a  product  of  indepecdeat  randooi  vari¬ 
ables  is  die  product  erf  their  averages.  To  get  the  fittest  boumi,  we  maximize  over  s, 
and  let 


E(6)  =  max  [s6-ti(s)j. 
s^O 

Setting  die  derivative  of  die  bracketed  quajiti^  to  zero,  we  obtain 


6  =  p‘(s)  = 


g'{s) 

"gM’* 


It  can  easily  be  shown  that  p'(0)  =  y,  =  y  .  and  that  ii'{s)  is  a  monotonically 

ni2x 

increasing  function  of  s.  Therefore  if  y  ^  5  Ssy",  there  is  a  non-negative  s  for 
which  6  =  |i’(s),  and  substitution  of  this  s  in  (s6-Ti{s))  gives  E(6). 

As  an  example,  which  will  be  useful  later,  consider  the  variable  y  which  takes  on 
the  value  one  with  probability  p  and  zero  with  probability  1  -  p.  Then 


g(s)  =  p  e®  +  1  -  p 


6  =  p‘(s)= - ; - 

p  e°  +  1  -  p 


6(l-p)  1  -  p 

=  -  6  In  p  -  (1-6)  In  (1-p)  -3C{6), 

where 

3C(6)  =  -  6  In  6  -  (1-6)  In  (1-6). 

Then  if  1  ^  6  ^  p, 

Pr(Y>n6)  < 

This  can  be  interpreted  as  a  bound  on  the  probability  of  getting  more  than  n6  occur¬ 
rences  of  a  certain  event  in  n  independent  trials,  where  the  probability  of  that  event  in 
a  single  trial  is  p. 

From  this  result  v/e  can  derive  one  more  fact  which  we  shall  need.  Let  p  =  l/E,  then 


Pr(Y»n5)=  f  (")  2-"« 
i=n6 


It  follows  that 

(n"^) 
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b,  C^timizatic^  d  Weights 

We  ncTisr  shoro  that  the  probabiU^  of  decodh^  error  or  failare  for  mmkacna -distance 
decoding  is  the  prcKjabOi^  that  a  certain  som  d  independent  identically  distribnted  ran- 
doaa  variables  exceeds  a  certain  quantity,  and  therefore  that  we  can  cse  the  Chemdi 
boimd. 

Let  a  code  word  fr«Kn  a  code  ca  length  n  and  mininaura  distance  d  be  transmitted. 
We  know  alrea<^  that  a  minimam-distance  decoder  will  fafl  to  decode  or  decode  incor¬ 
rectly  if  and  only  if 


/  fn  .(l-c.)-rn  .(l4c.)l 

Lj  ^  cy  y  ey  2  * 


5=d 


(2«) 


for,  in  the  case  of  errors-only  decoding,  all  =  1;  of  deletions-and-errors  decoding, 

=  0  or  1;  and  of  generalized  minimum-distance  decoding,  0  ^  L 

Under  the  assumption  that  the  diannel  is  memoryless  and  that  there  is  no  correla¬ 
tion  between  inputs,  the  probabilities  p^^  of  a  correct  reception  in  class  C^.  and  p^^  cd  an 
incorrect  reception  in  class  C.  are  constant  and  independent  from  s3rmbol  to  symbol. 
Consider  the  random  variable  tiiat  for  each  symbol  assumes  the  value  (l-c^.)  if  the  sym¬ 
bol  is  received  correctly  and  is  given  weight  c.,  and  (l-fo.)  if  the  symbol  is  received 

1  ^ 

incorrectly  and  given  weight  a..  These  are  then  independent,  identically  distributed  ran- 

1 

dom  variables  ^vith  the  common  moment-generating  function 


gfe)  =  ^ 


s(l-a.)  S(l+a  ) 

p  .  e  ^  +  p  .  e  '' 


(30) 


Furthermore,  the  condition  of  Eq.  29  is  just  the  condition  that  the  sum  of  these  n  ran¬ 
dom  variables  be  greater  than  or  equal  to  d.  Letting  6  =  d/n,  we  have  by  the  Chemoff 
boimd  that  the  probability  Pr(e)  of  error  or  failure  is  upperbounded  by 

-nE'(6) 


Pr(e)  <  e 


(31) 


where 


E'(6)  =  max  [s6-[i(s)], 


(32) 


ji(s)  being  the  natural  logarithm  of  the  g(s)  of  (30).  This  bound  is  valid  for  any  particular 

assignment  of  the  a.  to  the  reliability  classes;  however,  we  are  free  to  vary  the  a.  to 
j  J 

maximize  this  bound.  Let 

E{6)  =  max  E'(6)  =  max  [s6•'^l(s)]. 


a. 

J 


s.a. 


It  is  convenient  and  illuminating  to  maximize  first  over  the  distribution 

E(6)  =  max  [s6--Hjj^(s)], 
s 


(33) 
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•srfiere 


a  (s)  =  min  k(s)  =  mia  In  g{s)  =  la  min  g(s)  =  In  g_(s). 
■  c.  c.  c. 

1  J  J 


(34) 


p(s)  is  minimized  g^).  acd  'are  snail  ncFar  do  inis  for  die  three  tjpes  of 

minimam-dislanee  decodn^. 

For  errors-ocafy  dscodii^,  ihere  is  no  choice  m  fhe  c.,  all  arhich  mi^t  eqnal  ccie; 

3 

iherefcre. 


gjj,(s)  =  g(3)  =  e 


2s 


[1 •[!  -J 


(35) 


The  total  probabili^  of  symbol  error  is  p  =  2  p  ..  Jilaking  the  substitutions  s’  =  2s  and 

/  i 

o'  =  0/2,  we  see  that  this  bound  d^enerates  into  the  Chemoff  bound  os  secticm  2. 4a  on 
getting  more  fiian  d/2  symbol  errors  in  a  sequence  of  n  transmissions,  as  might  be 
expected. 

For  deletions -and-errors  decoding,  we  can  assign  some  outputs  to  a  set  E  of  erased 
symbols  and  the  remainder  to  a  set  R  of  reliable  symbols;  we  want  to  dioose  these  sets 
so  as  to  minimize  g(s).  In  symbols,  Oj  =  0,  all  j  e  E,  and  =  1»  all  j  e  R,  so 


g(s)  =  e 


2s 


r  Y  1 

.  s 

l  Pej 

+  e 

1  ‘Pe/Pej 


+ 

J  y 


y  p' 

C] 

jeR 


Assigning  a  parti<  ar  output  to  E  or  R  makes  no  difference  if 
e^^  p  •  +  p  .  =  e®  (d  .+p  .) 


or 


*^ej 

L.  H - -  e 

J  Pcj 


-s 


Here,  we  have  defined  L.,  the  error -likelihood  ratio,  as  P_Vp,,^:  we  shall  discuss  the 

J  ej  _g 


significance  of  L.  below.  We  see  that  to  minimize  g(s),  we  let  j  e  E  if  L.  >  e  and 

jeRifL.  <e  —  that  is,  comparison  of  L.  to  a  threshold  that  is  a  fimction  of  s  is  the 
1  J 

optimum  criterion  of  whether  to  erase  or  not.  Then 


2s 


gjjJs)  -  e  Pg{s)  +  e  p^(s)  +  Pg(s), 


where 
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jeR 


j  €  H  if  L..  ^  e 

J 


p^(s)=  ^  fPej^cj>=  3  f  E  if  Uj  >  e"® 
jeE 

p^{s)  =  1  -  Pg(s)  -  P^(s). 

Finally,  for  generalized  siimintim-disiance  decoiiisg,  we  have 


(36) 


g(s!  =  ^ 
j 


S(l-c.)  S(l4e 

o.e  ^-fo.e 

-  C3  -  ej 


) 

» 


which  we  miidmize  wife  respect  to  a  sic^e  by  setnng  fee  derivative 

3g(s)  s(l-c.)  s{14c  ) 

"a - =  ®  ^  sp^-  e  •’ 

3c^  -  cj  -  ej 

to  zero,  as  long  as  0  <  i.  The  resulting  condition  is 

-2sc.  ^ej 
e  ^  =— =  L.. 

D  -  j 
-Cj  J 

or 


“j  =  -  2i-^  h- 

"Whenever  is  such  that  -(lnL.)/Zs  >  1.  we  let  a.  =  I,  while  whenever  -Onh^/Zs  <  0, 
we  let  a.  =  0.  Then 


g  (s)  =  e 


2s 


Z  Pej 

jeR 

+ 

Z  Pcj 

jeR 

+  e® 

Z  ‘=’e3^=j> 

jeE 

L.  . 

+  e® 

/  2Vp  .D  - 

L  *^ej-  CJ 

jeG 

where 


jeR  if  L  ^  e 
j  e  E  if  L  5^  1, 

j  e  G  otherwise 


-2s 


(37) 


sa. 


and  we  have  used  e  ^  .  when  j  e  G. 

CJ  CJ 


Let  us  examine  for  a  moment  the  error-likelihood  ratio  Lj.  Denote  by  Pr(x.,yj)  the 
probability  of  trant,mitting  and  receiving  y^;  the  ratio  between  the  probability  that 
Xj^  was  not  transmitted,  given  the  reception  of  yj,  and  the  probability  that  x^  was  trans¬ 
mitted  {the  alternative  hypothesis)  is 
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L..  = 


1-Pr(x.lyj  S  Pr(x.,|y.)  2  Prii  ,.y  ) 

‘  i';si  ‘  3  i*^  ^  ^ 


B'  PrdJyj)  ’  PrCXjIyjl  ' 

Tiie  optiznTzm  decisica  role  for  the  inner  decoder  is  to  choose  feat  ij  for  -ntildi  Pr(x-  \y-) 
is  maximom,  or  equivalent^  for  vciiic^  L...  is  minimum.  But  nov  for  this  x., 


?  .  =  Pr^.,y.)  and  n  -  =  S  ?r(x.,,y.)- 


Ihus 


L.  =  min  L.-- 
J  i 

We  have  seen  feat  the  cmtimuni  reliability  weights  are  prcmortional  to  the  L,.;  thus  the 
error-lfeelihood  ratio  is  theoretically  central  to  tiie  inner  decoder’s  decision  making, 
bo&  to  its  choice  of  a  particular  output  and  to  its  adding  of  reliability  information  to  that 
choice.  (The  statistician  will  recognize  fee  as  sufficient  statistics,  and  will  appre¬ 
ciate  that  the  simplification  of  minimum-distance  decoding  consists  in  its  requiring  of 
these  statistics  oidy  tiie  largest,  and  the  corresponding  value  of  i.) 

The  TniniTTnnTi  value  that  can  assume  is  zero;  tiie  maximum,  when  all  q  inputs  are 
equally  likely,  given  y.,  is  q  -  i.  When  q  =  2,  therefore,  Lj  cannot  exceed  one.  It  fol¬ 
lows  that  for  generalized  minimum  -distance  decoding  with  binary  inputs  the  set  E  of 
Eq.  37  is  empty. 

In  the  discussion  of  the  Chemoff  bound  we  asserted  that  it  was  valid  only  when 
6  ^p'(O),  or  in  this  case  5  When  s  =  0,  the  sets  R  and  E  of  (36)  and  (37) 

become  identical,  namely 

j  €  R'  if  L  ^  1 

j  €  E  if  L  <  1. 

Therefore  is  identical  for  deletions -and-errors  and  generalized  minimum-distance 

decoding.  If  there  is  no  output  with  <  1  (as  will  always  be  true  when  there  are  only 
two  inputs),  then  K^^(O)  for  these  two  schemes  wdll  equal  that  for  errors-only  decoding, 
too;  otherwise  it  will  be  less.  In  the  latter  case,  the  use  of  deletions  permits  the  prob¬ 
ability  of  error  to  decrease  exponentially  wdth  n  for  a  smaller  minimum  distance  n6, 
hence  a  larger  rate,  than  without  deletions. 

We  now  maximize  over  s.  From  Eqs.  35-37,  Pjjj(s)  has  the  general  form 


p^(s)  =  ln  e^®P2(s)  +  e®pj(s)+p^(s)  . 


Setting  the  derivative  of  (s6->ij^(8))  to  zero,  we  obtain 


6  =  (s)  = 

'^m'  ' 


2  e^®  p2(s)  +  e®  pj(s)  +  e^®  p^(s)  +  e®  pj  (s)  +  p^s) 
e^®P2(s)  +  e®Pj(s)+p^(s) 


(38) 
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which  has  a.  solution  when  2  ^  o  ^  (0).  Siibstitatisg  &e  value  cf  s  thus  obtained  in 

(sS-ii  (s)),  we  obtain  E(6),  and  thus  a  bound  as  fee  form 
m 

Pr(e)  (39) 

V/e  would  prefer  a  bound  fiiat  guaranteed  the  existence  d  a  code  d  dimensionless 
rate  r  and  length  n  with  probabili^  d  decoding  failure  of  error  bounded  by 


?r{e) 


-nE(r) 
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The  Gilbert  bound  asserts  for  large  n  the  existence  of  a  code  wife  a  q-symbol  alpha¬ 
bet,  minimum  distance  6n,  and  dimensionless  rate  r,  where 


5C(6)  In  (q-l) 

Inq  ^  Inq 

Substitution  of  r  for  6  in  (39)  and  using  this  relation  with  the  equality  sign  gives  us  fee 
bound  we  want. 


Fig.  3.  Minimum-distance  decoding  exponents  for  a  Gaussian 
channel  with  L  =  3. 
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c-  CompataticHial  Compariscsis 

To  get  some  feeling  for  the  relative  performance  of  &ese  three  progressively  more 
involved  minimum-distance  decoding  schemes,  the  error  exponents  for  eacn  of  them 
•were  computed  over  a  few  simple  channels,  •with  the  use  of  fee  bounds  discussed  above. 

In  order  to  be  able  to  compute  easfly  the  error-likelihood  ratio,  we  considered  only 
diannels  •*!&  two  inputs.  Figure  3  displays  a  ^rpical  result;  diese  curves  are  for  a 
channel  •»!&  additive  Gaussian  noise  of  unit  •variance  and  a  signal  of  amplitude  either  t3 
or  -3,  •vt^di  is  a  high  signal-to-noise  ratio.  At  lower  signal-to-noise  ratios  the  curves 
are  closer.  We  also  considered  a  two-dimensional  Rayleigh  fading  channel  for  various 
signal-to-noise  ratios. 

For  these  channels,  at  least,  we  observed  fea^:  though  improvement  is,  course, 
ob'tained  in  going  from  one  decoding  scheme  to  flie  next  more  complicated,  this  improve¬ 
ment  Is  quite  sli^t  at  high  rates,  and  even  at  low  rates,  where  improvement  greatest, 
the  exponent  for  generalized  minimum -distance  decoding  is  never  greater  than  twice  that 
for  errors-only  decoding.  The  step  between  errors -only  and  deletions -and-errors 
decoding  is  comparable  to,  and  slightly  greater  than,  the  step  between  &e  latter  and 
generalized  minimum-distance  decoding. 

From  tiiese  computations  and  some  of  the  computations  that  will  be  reported  in  Sec¬ 
tion  VI,  it  would  seem  that  the  use  of  deletions  offers  substantial  improvements  in  per¬ 
formance  only  when  very  poor  outputs  (with  error -likelihood  ratios  greater  than  one) 
exist,  and  otherwise  that  only  moderate  returns  are  to  be  expected. 
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III.  BOSE-CHA.UDHURI-KOCQUENGHEM  CODES 


Our  purpose  now  is  to  make  the  important  class  of  BCH  codes  accessible  to  the 
reader  wife  little  previous  background,  and  to  do  so  ■with  emphasis  on  the  nonbinary  BCH 
codes,  particularly  the  RS  codes,  whose  powerful  properties  are  insufficiently  known. 

The  presentation  is  quite  single-minded  in  its  omission  of  all  but  the  essentials 

needed  to  understand  BCH  codes.  The  reader  who  is  interested  in  a  more  rounded  expo- 

sitirm  is  referred  to  the  comprehensive  and  stai  timely  book  by  Peterson-  *  In  particular, 

our  treatment  of  fniite  fields  will  be  unsatisfactory  to  the  reader  who  desires  some  depth 
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understanding  about  the  properties  that  we  assert;  Albert  is  a  wtdely  recommended 
mathematical  text. 

3.  1  FINITE  FIELDS 

Mathematically,  the  finite  field  GF(q)  consists  of  q  elements  that  can  be  added,  sub¬ 
tracted,  multiplied,  and  divided  almost  as  numbers.  There  is  always  a  field  element 
called  zero  (0),  which  has  the  proper^  that  any  field  element  p  plus  or  minus  zero  is  p. 
There  is  also  an  element  called  one  (1),  such  that  P  *  1  =  p;  further,  p  *  0  =  0.  If  p  is 
not  zero,  it  has  a  multiplicative  inverse  which  is  that  rmique  field  element  satisfying  the 
equation  p  •  P  ^  =  1;  division  by  p  is  accomplished  by  multiplication  ly  p 

The  simplest  examples  of  finite  fields  are  the  integers  modulo  a  prime  number  p. 
For  instance,  take  p  =  5;  there  are  5  elements  in  the  field,  which  we  shall  write  I,  II, 

III,  IV,  and  V,  to  distinguish  them  from  the  integers  to  which  they  correspond.  Addi¬ 
tion,  subtraction,  and  mxiltiplication  are  carried  out  by  converting  these  numbers  into 
their  integer  equivalents  and  doing  arithmetic  modulo  5.  For  mstance,  I  +  III  =  IV, 
since  1+3  =  4  mod  5;  III  +  IV  =  II,  since  3  +  4=2  mod  5;  I  •  III  =  III,  since  1  •  3  =  3 
mod  5;  III  *  IV  =  II,  since  3*4  =  2  mod  5.  Figure  4  gives  the  complete  addition  and 
multiplication  tables  for  GF(5). 
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Fig.  4.  Arithmetic  in  GF(5). 


Note  that  V  +  p  =  P,  if  p  is  any  member  of  the  field;  therefore,  V  must  be  the  zero 
element.  Also  V  *  P  =  V.  I  •  p  =  p,  so  I  must  be  the  one  element.  Since  I  •  I  =  II  •  III  = 
IV  •  IV  =  I,  =  I,  if^  =  III,  m"^  =  II,  and  IV"^  =  IV. 
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In  Figure  5  by  these  rules  we  have  constructed  a  chart  of  the  first  5  powers  of  the 
field  elements.  Observe  that  in  every  case  =  Pt  while  with  the  exception  of  fee  zero 
element  V,  P  *  =  L  Furthermore,  both  II  and  ni  have  fee  property  that  their  first  four 
powers  are  distinct,  and  therefore  yield  fee  4  nonzero  field  elements.  Therefore  if  we 
let  a  denote  fee  element  U,  say,  l=c®=c  ,  II  =  c,  III  =  c  ,  and  IV  =  e  ,  which  gives 
us  a  corvenient  representatirai  of  fee  field  elemerts  for  multiplication  and  division,  in 

fee  same  way  feat  fee  logarithmic  relationship  x  =  10  gives  us  a  convement  repre¬ 

sentation  of  fee  real  numbers  for  multiplication  and  division. 
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Fig.  5.  Powers  of  fee  field  elements. 


Fig.  6.  Representations  for  GF(5). 


Figure  6  displays  fee  tv'o  representations  of  GF(5)  feat  are  convenient  for  addition 

and  multiplication.  If  p  corresponds  to  a  and  and  y  corresponds  to  c  and  a  ,  then 

p  +  Y  ——  a  +  c  mod  5,  p  -  Y  — —  a  -  c  mod  5,  p  •  y  "^1,  and  p  •  Y  ^  ' 

^[b--dmod4]^  where  -  means  'corresponds  to'  and  fee  'mod  4'  in  the  exponent  arises, 

4  c 

since  a  =  a  =1. 

The  prime  field  of  most  practical  interest  is  GF{2),  whose  two  elements  are  simply 
0  and  1,  Addition  and  multiplication  tables  for  GF{Z)  appear  in  Fig.  7. 

It  can  be  shown^^  feat  fee  general  finite  field  GF(q)  has  q  =  p^  elements,  where  p 
is  again  a  prime,  called  fee  characteristic  of  fee  field,  and  m  is  an  arbitrary  integer. 

As  wife  GF(5),  we  find  it  possible  to  construct  two  representations  of  GF(q),  one  con¬ 
venient  for  addition,  one  for  multiplication.  For  addition,  an  element  p  of  GF(q)  is 
represented  by  a  sequence  of  m  integers,  bj,b^,.. .  »b^.  To  add  p  to  a,  we  add  b^ 
to  c.,  b_  to  c,,  and  so  forth,  all  modulo  p.  For  multiplication,  it  is  always  possible 

AM  W 

to  find  a  primitive  element  a,  such  that  fee  first  q  -  1 

powers  of  a  yield  fee  q  -  1  nonzero  field  elements. 

-  ,  n  1  Thus  ^  =  c°  =  1  (or  else  the  first  q  -  1  powers  would 

0  I  u  1 

— ^ ^  ^  not  be  distinct),  and  multiplication  is  accomplished  by 

110  10  1  adding  exponents  mod  q  -  1.  We  have,  if  p  is  any  non¬ 
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zero  element,  p 


zero  element,  p*^  ^  ^  =  1»  and 

Fig.  7,  Tables  for  GF(2).  zero  or  not,  p*^  =  p. 

Thus  all  that  remains  to  be  specified  is  the  proper¬ 
ties  of  GF(q)  to  make  the  one-to-one  identification  between  the  addition  and  multiplication 
representations.  Though  this  is  easily  done  by  using  polynomials  with  coefficients  from 
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GF(p),  ’  it  is  not  necessary  to  know  precisely  what  this  identification  is  to  understand 
the  sequel.  (In  fact,  avoiding  this  point  is  the  essential  simplification  of  our  presenta¬ 
tion.  )  We  note  only  that  the  zero  element  must  be  represented  by  a  sequence  of  m 
zeros. 

2 

As  an  example  of  the  general  finite  field,  we  use  GF(4)  =  GF(2  ),  for  which  an  addi¬ 
tion  table,  multiplication  table,  and  representation  table  are  displayed  in  Fig.  8. 

Note  that  GF(4)  contains  two  elements  that 
can  be  identified  as  the  two  elements  of  GFC2), 
namely  0  and  1.  In  this  case  GF(2)  is  said  to 
be  a  subfield  of  GF (4).  In  general,  GF((q‘)) 
is  a  subfield  of  GF{q)  if  and  only  if  q  =  q*  , 
where  a  is  an  integer.  In  paiiicular,  if  q  = 
p”'^,  the  prime  field  GF{p)  is  a  subfield  of 
GF(q). 

V/e  shall  need  some  explanation  to  under¬ 
stand  our  later  comments  on  shortened  RS 
codes.  For  addition,  we  have  expressed  the 
elements  of  GF{q)  as  a  sequence  of  m  ele¬ 
ments  from  GF(p),  and  added  place-by-place 
according  to  the  addition  rules  of  GF{p),  that 
is,  modulo  p.  Multiplication  of  an  element 
of  GF(q)  by  some  member  b  of  the  subfield  GF{p)  amounts  to  multiplication  by  an  inte¬ 
ger  b  modulo  p,  which  amounts  to  b-fold  addition  of  the  element  of  GF{q)to  itself, 
which  finally  amounts  to  term-by-term  multiplication  of  each  of  the  m  terms  of  the  ele¬ 
ment  by  b  mod  p.  (It  follo^vs  that  multiplication  of  any  element  of  GF(p"^)  by  p  gives 

a  sequence  of  zeros,  that  is,  the  zero  element  of  GF(p”^).)  It  is  perhaps  plausible  that 
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the  following  facts  are  true,  as  they  are  :  if  q  =  q'  ,  elements  from  GF{q)  can  always 
be  expressed  as  a  sequence  of  b  elements  from  GF(q'),  so  that  addition  of  2  elements 
from  GF(q)  can  be  carried  out  place-by-place  according  to  the  rules  of  addition  in 
GF(q'),  and  multiplication  of  an  element  from  GF(q)  by  an  element  p  from  GF(q')  can 
be  carried  out  by  term-by-term  multiplication  of  each  element  in  the  sequence  repre¬ 
senting  GF(q)  by  p  according  to  the  rules  of  multiplication  in  GF(q'). 

As  an  example,  we  can  write  the  elements  of  GF(l6)  as 
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Fig.  8,  Tables  for  GF(4). 
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where  a  is  a  primitive  element  of  GF{4).  Then,  by  using  Fig.  5,  (la)  +  (aa)  =  (a  0),  for 

2 

example,  while  a  ♦  (“1)  =  (° 
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We  have  observed  tiiat  p  •  P  =  0  for  all  elements  in  a  field  of  characteristic  p.  In 
particiilar,  if  p  =  2,  p  +  p  =  0,  so  tliat  P  =  -p  and  addition  is  the  same  as  subtraction 
in  a  field  characteristic  two.  Furthermore,  (P-ry)^  =  P^  +  (^)P^  ^t...  +  (p£j)  ^  + 

by  die  binomial  theorem;  but  every  term  but  the  first  and  last  are  multiplied  by  p, 
therefore  zero,  and  (P+y)^  =  P^  +  Y^t  when  P  and  y  are  elements  of  a  field  of  charac¬ 
teristic  p. 

3.2  LINEAR  CODES 

We  know  from  the  coding  theorem  that  codes  containing  an  exponentially  large  num¬ 
ber  of  code  words  are  required  to  achieve  an  exponentially  low  probability  of  error. 

4  22 

Linear  codes  ’  can  contain  such  a  great  number  of  words,  yet  remain  feasible  to  gen¬ 
erate;  they  can  facilitate  minimum  distance  decoding,  as  we  shall  see.  Finally,  as  a 
class  they  can  be  shown  to  obey  the  coding  theorem.  They  have  therefore  been  over¬ 
whelmingly  the  codes  most  studied. 

Assume  that  we  have  a  channel  with  q  inputs,  where  q  is  a  prime  power,  so  that 
we  can  identify  the  different  inputs  with  the  elements  of  a  finite  field  GF(q).  A  code  word 
f  of  length  n  for  such  a  channel  consists  of  a  sequence  of  n  elements  from  GF(q).  We 
shall  write  f  =  ,f^),  where  f.  occupies  the  i  place.  The  weight  w(f)  of  f  is 

defined  as  the  number  of  nonzero  elements  in  f. 

A  linear  combination  of  two  words  f  j  and  f^  is  written  pf  j  +  yf^,  where  p  and  y  are 
each  elements  of  GF(q),  and  ordinary  vectorial  (that  is,  place -by-place)  addition  in 
GF(q)  is  implied.  For  example,  if  ^2  ”  ^^21’^22’^23^' 

■‘^2  ”  ^^ir^21’^12“^22’^13“^23^*  ^ 

A  linear  code  of  length  n  is  a  subset  of  the  q“  words  of  length  n  with  the  important 

property  that  any  linear  combination  of  words  in  the  code  yields  another  word  in  the  code. 
A  code  is  nondegenerate  if  all  of  its  words  are  different:  we  consider  only  such  codes. 

Saying  that  the  distance  between  two  words  f ,  and  f„  is  d  is  equivalent  to  saying  that 

^  ..JS. 

the  weight  of  their  difference,  w(f ^-f^),  is  d,  since  f ^  -  f^  will  have  zeros  in  places  in 
which  and  only  in  which  the  tv/o  words  do  not  differ.  In  a  linear  code,  moreover,  f ,  -  f, 
must  be  another  code  word  f^.  so  that  if  there  are  two  code  words  separated  by  dis¬ 
tance  d  there  is  a  cede  word  of  weight  d,  and  vice  versa.  Excluding  the  all-zero,  zero- 
',/eight  word,  which  must  appear  in  every  linear  code,  since  0  *  fj  +  0  •  f^,  is  a  valid 
linear  combination  of  code  words,  and  the  minimum  distance  of  a  linear  code  is  then  the 
r-.inimum  weight  of  any  of  its  words. 

We  shall  be  interested  in  the  properties  of  sets  of  j  different  places,  or  sets  of 
size  j,  which  will  be  defined  with  reference  to  a  given  code.  Jf  the  j  places  are  such 
that  there  is  no  code  word  but  the  all-zero  word  with  zeros  in  all  j  places,  we  say  that 
these  j  places  form  a  non-null  set  of  size  j  for  that  code;  otherwise  they  form  a  null 
set. 

If  there  is  a  set  of  k  places  such  that  there  is  one  and  only  one  code  word  corre- 
sponding  to  each  of  the  possible  q  assignments  of  elements  from  GF(q)  to  those  k  places. 
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then  we  call  it  an  information  set  of  size  k,  thus  any  code  with  an  iriformation  set  of 
size  k  has  exactly  q  code  words.  The  remaining  n  -  k  places  form  a  check  set.  An 
information  set  must  be  a  non-null  set;  for,  othenvise  there  would  be  two  or  more  words 
corresponding  to  the  assignment  of  all  zeros  to  the  information  set. 

V/e  now  show  that  all  linear  codes  have  an  information  set,  by  showing  the  equiva¬ 
lence  of  the  two  statements:  (i)  there  is  an  information  set  of  size  k  for  the  code; 

(ii)  the  smallest  non-null  set  has  size  k.  For  an  information  set  of  size  k  implies  q 
code  words;  to  any  set  of  size  k  -  1  or  less  there  are  no  more  than  q  different  assign¬ 
ments,  and  thus  there  must  be  at  least  two  distinct  code  words  that  are  the  same  in  those 
places;  but  then  their  difference,  though  not  the  all-zero  word,  is  zero  in  those  places, 
so  that  any  set  of  size  k  -  1  or  less  is  a  null  set.  Conversely,  if  the  smallest  non-null 
set  has  size  k,  then  its  every  subset  of  k  -  1  places  is  a  null  set;  therefore  there  is  a 
code  word  f  that  is  zero  in  all  but  the  p^^  place,  but  is  nonzero  in  the  p*^  place;  if  f  has 
P  ill  the  p^  place,  then  P  ^  •  f  is  a  code  word  with  a  one  in  the  p^^  place,  and  zeros  in 

the  remaining  information  places.  The  k  words  with  this  property  are  called  generators; 
k  k 

Clearly,  their  q  linear  combinations  yield  q  code  words  that  are  distinct  in  the  speci¬ 
fied  k  places.  (This  is  the  property  that  makes  linear  codes  easy  to  generate.)  But 
there  can  be  no  more  than  q  words  in  the  code,  otherwise  all  sets  of  size  k  would  be 
null  sets,  by  the  arguments  above.  Thus  the  smallest  non-null  set  must  be  an  informa¬ 
tion  set.  Since  every  linear  code  has  a  smallest  non-null  set,  every  linear  code  has  an 

k 

information  set  and,  for  some  k,  q  code  words.  In  fact,  every  non-null  set  of  size  k 
is  an  information  set,  since  to  each  of  the  q  code  words  must  coi'respond  a  different 
assignment  of  elements  to  those  k  places.  We  say  such  a  code  has  k  information  sym¬ 
bols,  n-k  check  symbols,  and  dimensionless  rate  k/n,  and  call  it  an  (n,k)  code  on 
GF(q). 

If  the  minimum  distance  of  a  code  is  d,  then  the  minimum  weight  of  any  non- zero 

code  word  is  d,  and  the  largest  null  set  has  size  n  -  d.  Therefore  the  smallest  non-null 

set  must  have  size  n  -  d  +  1  or  less,  so  that  the  number  of  information  symbols  is 

n  -  d  +  1  or  less,  and  the  number  of  check  symbols  d  -  1  or  greater.  Clearly,  we  desire 

that  for  a  given  minimum  distance  k  be  as  large  as  possible;  a  code  that  has  length  n, 

minimum  distance  d,  and  exactly  the  maximum  number  of  information  symbols,  n-d  +  1, 
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will  be  called  a  maximum  code. 

We  now  show  that  a  code  is  maximum  if  and  only  if  every  set  of  size  n  -  d  +  1  is  an 
information  set.  For  then  no  set  of  size  n  -  d  +  1  is  a  null  set,  thus  no  code  word  has 
weight  d  -  i  or  less,  and  thus  the  minimum  weight  must  be  greater  than  or  equal  to  d; 
but  it  cannot  exceed  d,  since  then  there  would  have  to  be  n  -  d  or  fewer  information 
symbols,  so  the  minimum  weight  is  d.  Conversely,  if  the  code  is  maximum,  then  the 
minimum  weight  of  a  code  word  is  d,  so  that  no  set  of  size  n  -  d  +  1  can  be  a  null  set, 
but  then  all  are  information  sets. 

For  example,  let  us  investigate  the  code  that  consists  of  all  words  f  satisfying  the 

n  ^ 

equation  f  j  t  f^  +  , . .  +  f  =  U  f.  =  0.  It  is  a  linear  code,  since  if  f  ^  and  f^  satisfy  this 
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equation,  f,  -  (Pf.+vt^)  also  satisfies  the  equation.  Let  us  assign  elements  from  GF(q) 
arbitrarily  to  all  places  but  the  p  .  In  order  for  there  to  be  one  and  only  one  code  word 
with  these  elements  in  these  places,  fp  must  be  the  unique  solution  to 


or  f 


i^p 


i^p 


Clearly,  this  specifies  a  imique  f^  that  solves  the  equation.  Since  p  is  arbitrary,  every 
set  of  n  -  1  places  is  thus  an  information  set,  so  that  this  code  is  a  maximum  code  with 
length  n,  n  -  1  information  symbols,  and  minimum  distance  2. 

a.  Weight  Distribution  of  Maximum  Codes 

In  general,  the  number  of  code  words  of  given  weight  in  a  linear  code  is  difficult  or 
impossible  to  determine;  for  many  codes  even  d,  the  minimum  weight,  is  not  accurately 
known.  Surprisingly,  determination  of  the  weight  distriimtion  of  a  maximum  code  pre¬ 
sents  no  problems. 

Suppose  a  maximiun  code  of  length  n  and  minimum  distance  d,  with  symbols  from 
GF{q);  in  such  a  code  there  are  n  -  d  +  1  information  symbols,  and,  as  we  have  seen, 
every  set  of  n  -  d  +  1  places  must  be  an  information  set,  which  can  be  used  to  generate 
the  complete  set  of  code  v/ords. 

-Aside  from  the  all- zero,  zero- weight  word,  there  are  no  code  words  of  weight  less 
than  d.  To  find  the  number  of  code  words  of  weight  d,  we  reason  as  follows.  Take  an 
arbitrary  set  of  d  places,  and  consider  the  set  of  all  code  words  that  have  all  zeros  in 
the  remaining  n  -  d  places.  One  of  these  words  will  be  the  all-zero  word;  the  rest  must 
have  weight  d,  since  no  code  word  has  weight  less  than  d.  Consider  the  information  set 
consisting  of  the  n  -  d  excluded  places  plus  any  place  among  the  d  chosen;  by  assigning 
zeros  to  the  n  -  d  excluded  places  and  arbitrary  elements  to  the  last  place  we  can  gen¬ 
erate  the  entire  set  of  code  words  that  have  zeros  in  all  n  -  d  excluded  places.  There 
are  thus  q  such  code  words,  of  which  q  -  1  have  weight  d.  Since  this  argument  obtains 
for  an  arbitrary  set  of  d  places,  the  total  number  of  code  words  oi  weight  d  is  (^)  (q-1). 

Similarly,  let  us  define  by  the  number  of  code  words  of  weight  d  +  a  that  are 

nonzero  only  in  an  arbitrary  set  of  d  +  a  places.  Taking  as  an  information  set  the  n-d-a 
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excluded  places  plus  any  a  +  1  places  of  the  d  +  a  chosen,  we  can  generate  a  total  of  q 
code  words  with  all  zeros  in  the  n-d-a  excluded  places.  Not  all  of  these  will  have 
weight  d  +  a,  since  for  every  subset  of  size  d  +  1,  0<i^a-l,  there  will  be  code 
words  of  weight  d  +  i,  all  of  which  will  have  all  zeros  in  the  n-d-a  excluded  places. 
Subtracting  also  the  all- zero  word,  we  obtain 


i=0 


From  this  recursion  relation,  there  follows  explicitly 
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i=0 

Finally,  since  there  are  words  of  weight  d  +  a  in  an  arbitrary  set  of  d  +  a  places, 

we  obtain  for  the  totsil  number  of  code  words  of  weight  d  +  a, 

Nd+a  =  G"a)  “‘d+a- 

We  note  that  the  summation  in  the  expression  for  M ,  is  the  first  a  -r  1  terms  of  the 

d*T'3i  1  fd  1  1  Q  •  S'  I  j 

binomial  expansion  of  (q-1 )  ^  q  '  ,  so  that  as  q  —  «*,  —  q  .  Also,  we  may 

a+1 

upperbound  by  observing  that  when  we  generate  the  q  code  words  that  have  all 

zeros  in  an  arbitrary  n  -  d  -  a  places,  only  those  having  no  zeros  in  the  remaining  a  +  1 
information  places  have  a  chance  of  having  weight  d  +  a,  so  that 

“d+a  « 

3.  3  REED-SOLOMON  CODES 

We  can  now  introduce  Reed-Solomon  codes,  whose  properties  follow  directly  from 
those  of  van  der  Monde  matrices. 

a.  Van  der  Monde  Matrices 

An  (n+1)  X  (n+1)  van  der  Monde  matrix  has  the  general  form: 


where  the  a.  are  members  of  some  field.  The  determinant  of  this  matrix,  D,  also  a 
member  of  the  field,  is  a  polynomial  in  the  a.  in  which  no  a.  appears  to  a  power  greater 
than  n.  Furthermore,  since  the  determinant  is  zero  if  any  two  rows  are  the  same,  this 
polynomial  must  contain  as  factors  a.  -  a.,  all  i  it  j,  so  that 

D  =  D'  n  (a.-a.). 
i>j  ^ 

th 

But  now  the  polynomial  fl  (a.-a.)  contains  each  a.  to  the  n  power,  so  that  D'  can  only 

i>j  ^  ^  2^  n 

be  a  constant.  Since  the  coefficient  of  1  •  a^  •  a^  *  *  ‘  in  this  polynomial  must  be  one, 

D'  =  1,  and  D  =  11  (a.-a.). 

i>j  J 
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Now  suppose  that  all  the  a^  are  distinct.  Then  a.  -  z.  -A  0,  i  ^  j,  since  the  a.  are 
members  of  a  field.  For  the  same  reason,  a  product  of  nonzero  terms  cannot  be  zero, 
and  therefore  the  determinant  D  is  not  zero  if  and  only  if  the  a.  are  distinct. 

Similarly, 


m  +1 
o 


m  -rn 
o 


m  +1 
o 


m  +n 
o 


=  n  a^  °  n  (a.-a-). 
i  i>j  ^ 


m  +1 
o 


m  +n 
o 


Thus  the  determinant  of  such  a  matrix,  when  m^  AO,  is  not  zero  if  and  only  if  the  a. 
are  distinct  and  nonzero. 

b.  Reed -Solomon  Codes 

25 

A  Reed-Solomon  code  on  GF(q)  consists  of  all  words  f  of  length  n  ^  q  -  1  for 
which  the  d  -  1  equations 


m  <m^m  +d-2 
o  o 


are  satisfied,  where  m^  and  d  are  arbitrary  integers,  and  a  is  a  primitive  element  of 
GF(q). 

Clearly,  an  RS  code  is  a  linear  code,  since  if?,  and  f  are  code  words  satisfying  the 

equations,  pf  j  +  ■Vf2  “  ^3  satisfies  the  equations.  We  shall  now  show  that  any  n  -  d  +  1 

places  of  an  RS  code  can  be  taken  to  be  an  information  set,  and  therefore  that  an  RS  code 

is  a  maximum  code  with  minimum  distance  d. 

t’l  i  ^  m 

We  define  the  locator  Z.  of  the  1  place  as  a;  then  we  have  2  f.(Z.)  =  0,  m^  < 

1  1  1  o 

m  ^  m^  +  d  -  2.  We  note  that  since  a  is  primitive  and  n  <  q  -  1,  the  locators  are  dis¬ 
tinct  nonzero  elements  of  GF(q).  Let  us  arbitrarily  assign  elements  of  GF(q)  to  n-d  +  1 
places;  the  claim  is  that  no  matter  what  the  places,  there  is  a  unique  code  word  with 
those  elements  in  those  places,  and  therefore  any  n  -  d  +  1  places  form  an  information 
set  S.  To  prove  this,  we  show  that  it  is  possible  to  solve  uniquely  for  the  symbols  in 
the  complementary  check  set  S,  given  the  symbols  in  the  information  set.  Let  the  loca¬ 
tors  of  the  check  set  S  be  Y.,  1  <  j  d  -  1,  and  the  corresponding  symbols  be  d..  If 

3  3 

there  is  a  set  of  d^  that  with  the  given  information  symbols  forms  a  code  word,  then 


=  r(z.r. 


m  ^m^m  +d-2. 
o  o 
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By  defining  S  =  -  2  f,(Z.)™,  these  d  -  1  equations  can  be  written 
ieS  ^  ^ 


The  coefficient  matrix  is  of  the  van  der  Monde-like  t3rpe  that  we  examined  above,  and 
has  nonzero  determinant,  since  each  of  the  locators  is  nonzero  and  distinct.  Therefore 
there  is  a  unique  solution  for  the  d^  for  any  assignment  to  the  information  places,  so  that 
an  arbitrary  set  of  n  -  d  +  1  places  can  be  taken  as  an  information  set.  It  follows  that 
Reed-Solomon  codes  are  maximum  and  have  minimum  distance  d.  The  complete  dis¬ 
tribution  of  their  weights  has  already  been  determined. 

As  examples,  RS  codes  on  GF(4)  have  length  3  (or  less).  The  code  of  all  words  sat¬ 
isfying  the  single  equation  f ^  +  f^  +  f^  =  0  (m  =0)  has  minimum  distance  2.  Taking  the 
last  symbol  as  the  check  symbol,  we  have  13  =  ^^  +  ^2  minus  signs,  since  we 

are  in  a  field  of  characteristic  two),  so  that  the  code  words  are 


000 

101 

aOa 

2o  2 

a  Oa 

oil 

110 

,  2 
ala 

2, 

a  la 

Oaa 

,  2 
laa 

aaO 

a^al 

n  2  2 

Oa  a 

,  2 
la  a 

2, 
aa  1 

a^a^o 

The  code  of  all  words  satisfying  f^+f^+f^^O  and  f  ^  +  f^a  +  =  0  (m^=0)  has 

minimum  distance  3.  Letting  f^  =  “fj  and  f^  =  we  get  the  code  words 

000  laa^  aa^l  a^la. 

The  code  of  all  words  satisfying  f  ^  +  f^a  +  f^a^  =  0  and  f  ^  +  f^a^  +  f^a"^  =  0  (m^=l) 
also  has  minimum  distance  3;  its  code  words  are 

222 

000  111  aaa  a  a  a  . 

c.  Shortened  RS  Codes 

A  Reed-Solomon  code  can  have  length  no  longer  than  q  -  1,  for  that  is  the  total  num¬ 
ber  of  nonzero  distinct  elements  from  GF(q)  which  can  be  used  as  locators.  (If  ni^=0, 
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we  can  also  let  0  be  a  locator,  with  the  convention  0®=1,  to  get  a  code  length  q.)  If  we 
desire  a  code  length  n  <  q  -  1,  we  can  clearly  iise  any  subset  of  the  nonzero  elements 
of  GF(q)  as  locators. 

Frequently,  in  concatenating  codes,  we  meet  the  condition  that  q  is  very  large,  while 

n  needs  to  be  only  moderately  large.  Under  these  conditions  it  is  usually  possible  to 

find  a  subfield  GF{q’)  of  GF(q)  such  that  n  <  q*.  A  considerable  practical  simplification 

then  occtirs  when  we  choose  the  locators  from  the  subfield  of  GF(q’}.  Recall  that  if 

q’^  =  q,  we  can  represent  a  particular  symbol  f .  by  a  sequence  of  b  elements  from 

GF{q‘),  (f.,  ,f._,. . .  ,f..  }.  The  conditions  2  f.Z™  =  0,  m  ^m^m  -rd-2  then  become 
iz  lb  .11  o  o 

the  conditions  2  f..zP  =  0,  m  =$  m  m  -5-  d  -  2,  1  ^  i  ^  b,  since  when  we  add  two  f. 

or  multiply  them  by  Z?^,  we  can  do  so  term-by-term  in  GF{q‘).  In  effect,  we  are  inter¬ 
lacing  b  independent  Reed-Solomon  codes  of  length  n  <  q'  -  1.  The  practical  advantage 
is  that  rather  than  having  to  decode  an  RS  code  defined  on  GF{q),  we  need  merely  decode 
RS  codes  defined  on  the  much  smaller  field  GF(q'i  b  times.  The  performance  of  the 
codes  cannot  be  decreased  by  this  particular  choice  of  locators,  and  may  be  improved  if 
only  a  few  constituent  elements  from  GF(q*)  tend  to  be  in  error  when  there  is  an  error 
in  the  complete  symbol  from  GF{q). 

As  an  example,  if  we  choose  =  1  and  use  locators  from  GF{4)  to  get  an  RS  code 
on  GF(l6)  of  length  3  and  minimum  distance  3^  by  using  the  representation  of  GF(l6)  in 
terms  of  GF(4),  we  get  the  16  code  words 


/000\  /000\  /000\  /  0  0  0  \  /111\  /111\  /111\  flll\ 

\000j*  Vlll/’  \aaaj>  [  222’  VOOO/'  Vlll/’  \acaj*  \ZZZ)' 

\a  a  a  1  \a  a  a  / 


or  in  effect  two  independent  RS  codes  on  GF{4). 

3.4  BCH  CODES 

We  shall  now  give  a  general  method  for  finding  a  code  with  symbols  from  GF(q)  of 

length  n  and  minimum  distance  at  least  d.  If  n  ^  q  -  1,  of  course,  an  RS  code  will  be 

the  best  choice,  since  it  is  maximum.  But  often  n  is  larger  than  q;  for  instance,  if  we 

2  6  27 

v/ant  a  binary  code,  q  =  2,  and  the  longest  RS  code  has  length  one.  BCH  codes  ’  are 
a  satisfactory  solution  to  this  problem  when  n  is  not  extravagantly  large,  and  are  the 
only  general  solution  known. 

Let  us  find  an  integer  a  such  that  q  >  n.  Then  there  is  an  RS  code  on  GF  (q  )  with 
length  n  and  minimum  distance  d.  Since  GF(q)  is  a  subfield  of  GF{q^),  there  will  be  a 
certain  subset  of  the  code  words  in  this  code  with  all  symbols  in  GF(q).  The  minimum 
distance  between  any  two  words  in  this  subset  must  be  at  least  as  great  as  the  minimum 
distance  of  the  code,  so  that  this  subset  can  be  taken  as  a  code  on  GF(q)  with  length  n 
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and  minimum  distance  at  least  d.  Any  such  subset  is  a  BCK  code. 

We  shall  call  GF(q)  the  s3Tnbol  field  and  GF(q^)  the  locator  field  of  the  code. 

For  example,  from  the  three  RS  codes  on  GF(4)  given  as  examples,  we  can  derive 
the  three  binary  codes: 

a)  000  b)  000  c)  000 

on  111 

101 

no 

Since  the  sum  of  any  two  elements  from  GF(q)  is  another  element  in  GF(q),  the  sum 
of  aiy  two  words  in  the  subset  of  code  words  with  symbols  from  GF(q)  is  another  word 
with  symbols  from  GF(q),  so  that  the  subset  forms  a  linear  code.  There  must  therefore 
be  q  words  in  the  code,  where  k  has  yet  to  be  determined.  How  useftil  the  code  is 
depends  on  how  large  k  is;  example  b)  shows  that  k  can  even  be  zero,  and  examples  b) 
and  c)  show  that  k  depends  in  general  c.i  the  choice  of  m^.  We  now  show  how  to  find  the 
number  of  information  symbols  in  a  BCH  code. 

Since  all  code  words  are  code  words  in  the  original  RS  code,  all  mtist  satisfy  the 
equations 

/  =  0,  m  =Sm<m  +d-2. 

Zj  1  1  o  o 


Let  the  characteristic  of  the  locator  field  GF(q^)  be  p;  then  q^  =  p^^,  q  =  p^,  and  thus 

raising  to  the  q  power  is  a  linear  operation,  (P+y)^  =  Raising  each  side  of 

th 

these  equations  to  the  q-  power,  we  obtaiti 


0  = 


I  = I 


f.Z. 
1  1 


mq 


m  <m^m  +d-2. 
o  o 


Here,  we  have  used  f?  =  L  since  f.  is  an  element  of  GF(q).  Repeating  this  operation, 
we  obtain 


1 


0  <  j  <  a  -  1, 


(40) 


where  the  process  terminates  at  j  =  a  -  1,  since  z!^  is  an  element  of  GF{q^),  and  there- 
/  V  a  ^ 

(z^)'^  =  Z^.  Not  all  of  these  equations  are  different,  since  if  mq^  =  m'q^ 

\  1  /  1  j  •, 


fore 


mo^  m*Q^ 

mod  q  -  1  for  some  m'  m,  and  j'  9^  j,  then  Z^  ^  =  Z.  ^  ,  for  all  i.  Let  us  denote 
by  r  the  number  of  equations  that  are  distinct  —  that  is,  the  number  of  distinct  integers 
modulo  q  -  1  in  the  set 
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2 

m^,  qni  ,  q  m  ^ 
o  o  o 


i-  1,  q(m^-fl),  ....  q^  *(m^^l) 

0  —  1 

4  d  -  2,  q(in^-rd-2),  . . . ,  q  (m^-rd-2). 


Clearly,  r  ^  a{d-l).  V/e  label  the  distinct  members  of  this  set  m^,  1  <  f  <  r. 

We  now  show  that  r  is  the  number  of  check  symbols  in  the  code.  Let  p  be  aity  ele- 

3l  ^ 

ment  of  GF(q  )  with  r  distinct  consecutive  pov’ers  P  ,P  ’  ,P  '  .  We  claim  that 

the  places  whose  locators  are  these  r  consecutive  powers  of  P  may  be  taken  as  a  check 
set,  and  the  remaining  n  -  r  as  an  information  set.  Let  the  symbols  in  the  information 
set  S  be  chosen  arbitrarily.  A  code  word  is  imiquely  determined  by  these  infonnation 

symbols  if  there  is  a  imique  solution  to  the  r  equations  2  f|(Z.)  ,  1  ^  ^  r,  which 

in  matrix  form  is 


bm, 

P 

rb+l)m 

P 

{b+r-l)m, 

P 

Si 

bm. 

P  ^ 

• 

(b+l)m, 

p  ^  ... 

• 

(b-5-r-l)m 

P  ^ 

• 

• 

w 

II 

• 

bm^ 

P  ^ 

• 

(b+l)m 
p  ^ 

• 

(b-fr-^m 

P 

• 

%+r-l 

• 

Sr_ 

(41) 


m, 


in  which  we  have  defined  S-  =  2  f.Z. 

^  ieS  ^  ^ 


The  coefficient  matrix  is  van  der  Monde-like 
A-  ^  kj  in.  Q 

(for  a  different  reason  than  before),  and  since  p  are  all  nonzero  and  distinct,  the 

equations  have  the  solution  as  claimed. 

We  must  show  that  the  f^^.  that  solve  Eqs.  41  are  elements  of  the  symbol  field  GF(q). 
Suppose  we  raise  Eqs.  41  to  the  q^^  power;  we  get  a  superficially  new  set  of  equations 
of  the  form 


I 


a 

f?(Z.)  ^  =  0 


(42) 


But  for  i  e  S,  L  e  GF(q),  so  f?  =  f..  Furthermore,  Eqs.  42  are  exactly  the  r  distinct 
Eqs.  2,  since  Eqs.  2  are  the  distinct  equations  in  Eqs.  1.  Thus  •  •  »^b+r-l 

Eqs.  41  for  the  same  information  symbols  L,  i  e  S,  as  did  •  •  •  »^b+r-l’ 

were  shown  to  be  the  unique  solution  to  Eqs.  41.  Therefore  f^^^  =  but  the  elements 
of  GF(q^)  which  satisfy  =  P  are  precisely  the  elements  of  GF{q)^,  so  that  the  f^^^^  are 
elements  of  GF{q), 

n— r 

Thus  the  code  has  an  information  set  of  ,i  -  r  symbols,  and  therefore  there  are  q 
code  words. 
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We  remark  that  aiy  set  of  r  places  whose  locators  can  be  represented  as  r  consec¬ 
utive  powers  c€  some  field  element  is  thus  a  check  set,  and  the  remaining  n  -  r  an  infor¬ 
mation  set.  In  general  every  infonnation  set  cannot  be  so  specified,  but  this  gives  us  a 
lower  bovmd  to  their  number. 

For  example,  to  find  the  number  of  check  symbols  in  a  binary  code  of  length  1 5 
(q‘‘=l6)  and  minimum  distance  7,  with  m^  chosen  as  1,  we  write  the  set 

1,  2,  4,  8 

3,  6,  12,  9  (24=9  mod  15) 

5,  10  (20=5  mod  15) 

where  we  have  excluded  all  duplicates.  There  are  thus  10  check  symbols.  This  is  the 
(15, 5)  binary  Bose-Chaudhuri  code. 

a.  Asymptotic  Properties  of  BCH  Codes 

We  recall  that  for  large  n  the  Gilbert  boimd  guarantees  the  existence  of  a  code  with 

3C(6)  ^ 

minimum  distance  n  and  dimensionless  rate  k/n  =  1  -  ,  *  -  -  6  — = - .  With  a  BCH 

In  q  In  q 

code  v/e  are  guaranteed  to  need  no  more  than  a(d-l)  =  an6  check  symbols  to  get  a  mini¬ 
mum  distance  of  at  least  d  =  n5,  but  since  q^  must  be  greater  than  n,  a  must  be  greater 
than  In  n/ln  q,  so  that  for  any  fixed  nonzero  6,  an6  exceeds  n  for  very  large  n.  Thus, 
at  least  to  the  accuracy  of  this  boiind,  BCH  codes  are  useless  for  very  large  n.  It  is 
well  to  point  out,  however,  that  cases  are  known  in  which  the  minimum  distance  of  the 
BCH  code  is  considerably  larger  than  that  of  the  RS  code  from  which  it  was  derived,  and 
that  it  is  suspected  that  their  asymptotic  performance  is  not  nearly  as  bad  as  this  result 
would  indicate.  But  nothing  bearing  on  this  question  has  been  proved. 
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IV.  DECODING  BCH  CODES 


We  shall  present  here  a  decoding  algorithm  for  BCH  codes.  Much  of  it  is  based  on 
the  error -correcting  algorithm  of  Gorenstein  and  Zierler  ;  we  have  extended  the  algo¬ 
rithm  to  do  deletions-and-errors  and  hence  generalized  minimum  -distance  d*»coding 

{cf.  Section  il).  We  have  also  appreciably  simplified  tlie  final,  erasure -correcting 

.  27 

step. 

Since  we  intend  to  use  a  Reed-Solomon  code  as  the  outer  code  in  all  of  our  concatena¬ 
tion  schemes,  and  minimization  of  decoder  complexity  is  our  purpose,  we  shall  consider 
in  Section  VI  in  some  detail  the  implementation  of  this  algorithm  in  a  special-  or 
general-purpose  compirter. 

Variations  on  this  algorithm  of  lesser  interest  are  reported  in  Appendix  A. 


4.1  INTRODUCTION 

In  Section  ni  we  observed  that  a  BC'H  code  is  a  subset  of  words  from  an  RS  code  on 

GF(q)  whose  symbols  are  all  members  of  some  subfield  of  GF(q).  Therefore  we  may  use 

the  same  algorithm  that  decodes  a  certain  RS  code  to  decode  all  BCH  codes  derived  from 

that  code,  with  the  proviso  that  if  the  algorithm  comes  up  with  a  code  word  of  the  RS  code 

which  is  not  a  code  word  in  the  BCH  code  being  used,  a  decoding  failure  is  detected. 

Let  us  then  consider  the  transmission  of  some  code  word  f  =  {f, ,  f„,  . . . ,  f  )  from  a 

12  n 

BCH  code  v/hose  words  satisfy 


^  =  0,  <  rn  <  m^  +  d  -  2, 

i 

where  the  Z.,  the  locators,  are  nonzero  distinct  elements  of  GF(q).  In  examples  we  shall 
use  the  RS  code  on  GF(16)  with  n=  15,  1,  and  d  =  9,  and  represent  GF(16)  as  follows: 


0 

0000 

0001 

7 

a 

1101 

11 

a 

0111 

1 

1000 

1100 

1010 

12 

a 

nil 

a 

0100 

0110 

0101 

13 

a 

1011 

2 

a 

0010 

0011 

10 

a 

1110 

14 

a 

1001 

We  shall  let  Z.  =  a~^  = 

We  suppose  that  in  the  received  word"?  =  (r^^,  r^,  . . .  ,  r^),  s  symbols  have  been 
classed  as  unreliable,  or  erased.  Let  the  locators  of  these  symbols  be  Y,  ,  1  <  k  <  s, 

fVi  ^ 

and  if  the  deletion  is  in  the  r”  place,  .et  d,  =  r.  -  f.  be  the  value  of  the  deletion,  pos- 

K  1  1 

sibly  zero.  Also,  of  the  symbols  classed  as  reliable,  let  t  actually  be  incorrect.  Let 

th  th 

the  locators  of  these  errors  be  X^,  1  <  j  ^  t,  and  if  the  j  error  is  in  the  i  ”  place,  let 

its  value  e.  =  r^  ~  f.,  where  now  e^  ^  0.  We  define  the  parity  checks,  or  syndromes, 

by 

m 
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then 


m 


s  =  y  f  .z“  ^  y  e -xf  -r  y  d,  Yf 

ra  ZjIi  L/  3  ]  £j  k  i 

i  3  j 


m 

k 


The  decoding  problem  is  to  find  the  e.,  X-,  and  d.  from  the  S  and  Y, .  The  following 

3  3  K  m  K 

algorithm  solves  this  problem  whenever  2t+s<d. 

We  shall  find  it  convenient  in  the  seouei  to  define  the  column  vectors 


®(a.b)  -  (V®a-1 . ®b)^’  2 

^J(a.b)  ^  (^3"  . ■  ““ 

T 

^L;a.b)  -  Vk>  ^k  •  ^k)  ’ 


Evidently, 


t  o 

^(a.b)  =  Yj  ®j^j(a,b)  ^  X  Vk(a,b)- 

j=l 


k=l 


Finally,  let  us  consider  the  polynomial  (r{Z)  =  0  defined  by 


v(Z)  =  {Z-Z^)(Z-Z2)  . . .  (Z-ZJ. 

where  Z^  are  members  of  a  field.  Clearly  v{Z)  =  0  if  and  only  if  Z  equals  one  of  the  Z, 
Expanding  (Z),  we  get 

(r(Z)  =  Z^  -  (Zj+Z2+  .  .  .  +Z  JZ^~^  +  .  .  .  +  (-1)^  {ZjZ^ .  .  .  Z^). 

The  coefficient  of  in  this  expansion  is  defined  as  the  elementary  sym¬ 

metric  function  of  Zj ,  Z^,  • . .  ,  Zj^;  note  that  is  always  one.  We  define  v  as  the 
row  vector 


then  the  dot  product 
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where 


2(l.o)  *  . 0 


4.  2  MODIFIED  CYCLIC  PARITY  CHECKS 


The  are  not  the  only  parity  checks  that  coiild  be  formed;  in  fact,  any  linear  com¬ 
bination  of  the  is  also  a  valid  parity  check.  We  look  for  a  set  of  d  -  s  -  1  independ¬ 
ent  parity-check  equations  which,  unlike  the  S^,  do  not  depend  on  the  erased  symbols, 
yet  retain  the  general  properties  of  the  S^. 

Let  be  the  vector  of  the  symmetric  functions  of  the  erasure  locators  Yj^.  We 
define  the  modified  cyclic  parity  checks  T^  by 

"  "'d  ■  ^(m^+£+s,  m^+£)- 

Since  we  must  have  m  <  m  +1  and  m  +  l  +  s<n'  +d-2,  the  range  of  1  is  O^l^Sd-s-Z, 

o  o  o 

In  the  case  of  no  erasures,  T.  =  S„  ,  ..  Now,  since 

o 


ni>£. 


(m^+£+s,  m^+£) 


v'  m  • 

=  y  e-iX.  ° 

L  3  3 


^3(s,o)  + 


nio+£^ 
d  Y  Y 
'^k^k  k(s,o)' 


v/e  have 


i  °^d  '  ^(m^+£+s,m^+£) 


^  .n  s 

Z  "d-  Xj(s,o)"Z 


'•k’fk  "d  ■  o) 


V  ™r>  0  K'  m  +£ 

Z  "Zj  '^d<^j>5  +  Z  “k^k  °  -d(Yk> 


in  which  ws  have  defined  Ej^e^Xj  °or^(Xj)  and  used  o'jj(Yj^)  =  0,  since  Yj^  is  one  of  the 
erasure  locators  upon  which is  defined.  The  fact  that  the  modified  cyclic  parity  checks 
can  be  expressed  as  the  simple  function  of  the  error  locators  given  by  Eq.  42  allows  us 
to  solve  for  the  error  locators  in  the  same  way  as  if  there  were  no  erasures  and  the 


minimum  distance  were  d  -  s. 
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4.  3  DETERMINING  THE  NUMBER  OF  ERRORS 


If  d-s  is  odd,  the  maximum  number  of  errors  that  can  be  corrected  is  t^=  (d-s-l)/2, 
while  if  d  -  s  is  even,  up  to  t^  =  {d-s-2)/2  errors  are  correctable,  and  +  1  are 
detectable. 

We  now  show  that  the  actual  number  of  errors  t  is  the  rank  of  a  certain  t  X  t 

o  o 

matrix  M,  whose  components  are  modified  cyclic  parity  checks,  as  long  as  t  <  t^.  In 
order  to  do  this  we  use  the  theorem  in  algebra  that  the  rank  of  a  matrix  is  t  if  and  only 
if  there  is  at  least  one  tX  t  submatrix  with  nonzero  determinant,  and  all  (t+l)  X  (t+1) 
submatrices  have  zero  determinants.  We  also  use  the  fact  that  the  determinant  of  a 
matrix  which  is  the  product  of  square  matrices  is  the  product  of  the  determinants  of  the 
square  matrices. 

THEOREM  (after  Gorenstein  and  Zierler^^):  If  t  <  t^,  then  M  has  rank  t,  where 


M  = 


^2t  -2 
o 

T 

2t  -3 
o 

T 

■^t  -1 
0 

r 

2t  -3 

0 

T 

■^2t  -4 
o 

T 

t  -2 
0 

T 

t  -1 

0 

T 

^t  -2 

0 

T 

0 

Since  2t  -  2  <  d  -  s  -  2,  all  the  T.  in  this  matrix  are  available, 

PROOF:  First  consider  the  txt  submatrix  formed  by  the  first  t  rows  and  columns 
of  M.  Using  Eq,  42,  we  can  write  as  the  product  of  three  txt  matrices  as  follows: 


M, 


- 

■ 

N 

1 

1 

- 

-2 

o 

^2t  -3 
o 

T 

21  -t-1 
0 

< 

xj-' 

E,X,  ° 

0 

0 

x>-> 

X- 

l 

.  'T* 

x‘- 

0 

2t  -2t 

E2’'2'’ 

0 

x‘-> 

X-  . 

1 

2t  -2t 

T 

‘2t  -1-2 

0 

^2t  -21 
0 

1 

1 

0 

0 

x;-> 

x;-^ 

1 

as  may  be  checked  by  direct  multiplication. 


The  center  matrix  is  diagonal,  and  therefore  has  determinant 
m 


n 


2t  -2t 


since 


E 


=  e.X.  (T  ,(X.),  X.  Y,  ,  and  e.  0,  this  determinant  is  nonzero.  The  first  and  third 

JJJtlJjK  J  -r-r 


matrices  are  van  der  Monde,  with  determinants 


n 

C.'l 


(X.-X.),  which  is  nonzero  since  the 

J 


error  locators  are  distinct.  The  determinant  |M^|  is  then  the  product  of  three  nonzero 
factors,  and  is  therefore  itself  nonzero.  Thus  the  rank  of  M  is  t  or  greater. 

Now  consider  any  of  the  (t+l)  X  (t+l)  submatrices  of  M,  which  will  have  the  general 
form 
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a  -fb  -fb, 

o  o  o  1 


V.]  ■■■  “Ih  °  •••  “  “1  < 


*1 

1  i 


^  o  p  E^ 


b  b, 

0  0  Xj® 


!  -t  ^ 

V“.J  'l’^- 


“Ji:  : 


‘’o  *’! 

.  O  V  ' 


Ej  0  {Xj®  X.* 

o  0  Ip  o 


Again,  this  may  be  checked  by  direct  multiplication  with  the  use  of  Eq.  42.  Each  of  the 
three  factor  matiices  has  an  all-zero  row  and  hence  zero  determinant;  therefore  all 
(t+1)  X  (t+l)  submati'ices  of  M  have  zero  determinants.  Tlius  the  rank  of  M  can  be  no 
greater  than  t;  birt  then  it  is  t. 

4.4  LOCATING  THE  ERRORS 

We  now  consider  the  vector"?  of  elementary  symmetric  fimctions  cr  of  the  X.,  and 
its  associated  polynomial  •* 


where 


t  t-i  T 

^(t,o)  =  '^«^  ,  1)  . 


If  we  could  find  the  components  of  cr^,  we  could  determine  the  error  locators  by  finding 
the  t  distinct  roots  of  Vg(X).  If  we  define 


"^(a,  b)  “  ("^a’  "^a-l  ’  "  '  ’  "^b.^  ’  °  ^  ^ 


«a<Sd-s-2, 


then  from  Eq.  42 


^(a,b)  =  E 
3=1 


and  we  have 


V 

Vj'e(X.)=0, 

j=l 


o^je'=sd-s-t-2. 


We  know  that  the  first  component  of  v  ,  cr  equals  one,  so  that  this  gives  us  a  set  of 

®  ®o 
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2t^  -  t  equations  in  t  unknowns.  Since  t  ^  by  assiunption,  we  can  take  the  t  equations 
specified  by  2t^  -  2t  <  1  ‘  <  2t^  -  t  -  1 ,  which  in  matrix  form  are 


T 

2t  -1 
0 

T  T  T 

2t  -2  2t  -3  ^2t  -t-1 

o  o  o 

-<r 

T 

2t  -2 

T  T  T 

2t  -3  2t  -d  *"  -t-2 

a 

e_ 

o 

o 

o 

1 

.  o 

1 

2 

T 

2t  -t 
0 

•  • 

•  • 

T  T  T 

2t  -t-1  ■^2t  -t-2  "*  2t  -2t 

o  o  o 

(-l)V 

®t 

or,  defining 


-1,-t  -t) 
o  ' 


=  a-'  M 


if*.,  • 

e  t 


(43) 


Since  0  <  2t^  -  2t  and  2t^  -  l^d-s-2,  all  of  the  needed  to  form  these  equations 
are  available. 

We  have  already  shown  that  has  rank  t,  so  that  these  equations  are  soluble  for 
cr^  and  hence  Then  since  o’g(Z.)  is  zero  if  and  only  if  Z.  is  an  error  locator,  calcula¬ 
tion  of  ‘^^(Z.)  for  each  i  will  reveal  in  tiurn  the  positions  of  all  t  errors. 


a.  Remarks 


The  two  steps  of  finding  the  rank  of  M  and  then  solving  a  set  of  t  equations  in  t  equa¬ 
tions  in  t  iinknowns  may  be  combined  into  one.  For,  consider  the  equations 


'  o  o 


(44) 


where 


O'"  = 

e 


0, 


0). 


28 

An  efficient  way  of  solving  (44)  is  by  a  Gauss -Jordan  reduction  to  upper  triangular 
form.  Since  the  rank  of  M  is  t,  this  will  leave  t  nontrivial  equations,  the  last  equa¬ 
tions  being  sunply  0  =  0.  But  now  is  the  upper  left-hand  corner  of  M,  so  that  the  upper 
left-hand  corner  of  the  reduced  M  will  be  the  reduced  M^.  Therefore,  we  can  at  this 
point  set  the  last  “  t  components  of  to  zero,  and  get  a  set  of  equations  equivalent 
to  (44),  which  can  be  solved  for  o-^.  Thus  we  need  only  one  reduction,  not  two;  since 
Gauss -Jordan  recudtions  tend  to  be  tedious,  this  may  be  a  significant  saving. 

ITiis  procedure  works  whenever  t  <  t^  —  that  is,  whenever  the  received  word  lies 


43 


'i 


i 


within  distance  of  some  code  word,  not  counting  places  in  which  there  are  erasures. 
It  will  generally  be  possible  to  receive  words  greater  than  distance  t^  from  any  code 
word,  and  upon  such  words  the  procedure  discussed  above  must  fail.  This  failxu'e,  cor¬ 
responding  to  a  detectable  error,  will  turn  up  either  in  the  failure  of  Eq.  44  to  be  reduc¬ 
ible  to  the  form  described  above  or  in  which  has  an  insufficient  number  of  nonzero 

roots. 

Finally,  if  d  -  s  is  even,  the  preceding  algorithm  will  locate  all  errors  when  t  <  t^  = 
(d-s-2)/2.  Also,  if  t  =  t^  +  1,  an  uncorrectable  error  can  be  detected  by  the  nonvanishing 
of  the  determinant  of  the  t  X  t  matrix  with  in  the  upper  left,  in  the  lower  right. 

Such  an  error  would  be  detected  anyway  at  some  later  stage  in  the  correction  process. 


'3.  Example  1 

Consider  the  (15,7)  distance  9  RS  code  that  has  been  introduced.  Suppose  there  occur 
4 

errors  of  value  a  in  the  first  position  and  a  in  the  fourth  position,  and  erasures  of 

7 

value  1  in  the  second  position  and  a  in  the  third  position. 

L  4  „  _  14  ^  V  -  11  ^  -1  V  13  ^  _  7  „  12\ 

^e^-c.Xj-a  ,e^-a,X^-a  ,dj-l,Yj-a  .d^-a.Y^-c  ). 

In  this  case  the  parity  checks  S  will  turn  out  to  be 

o  14  o  13  r,  60  9o  13  o  10  JO  4 

Sj  =  a  ,S2  =  a  ,  S3  =  a  ,  S^  =  c  ,  S^  =  a  .  S^  =  a  ,  =  a  ,  and  Sg  =  a  . 

With  these  eight  parity  checks  and  two  erasure  locators,  the  decoder  must  find  the 
number  and  position  of  the  errors.  First  it  forms 


(Since  we  are  working  in  a  field  of  characteristic  two,  where  addition  and  subtraction 
are  identical,  we  omit  minus  signs.) 

(Td  =  1 

o 

(Td  =  Yj  +  Y^  =0^^  +  =  (1011)  +  (1111)  =  (0100)  =  a 

V  V  13  12  10 

‘^d^  =  ^1^2  =  ®  “  =  “  • 


Next  it  forms  the  six  modified  cyclic  parity  checks  by  Eq.  41. 

'T  o  1  Cl  o  5  ,  13  ,  10  14  5  ,  14  ,  9 

=  S,  +  (T  ,  S_  +  cr  ,  S,  =  a  +  a  •  a  +  a  -a  =  a  +  a  +  a 
o  3  dj  2  d^  1 


=  (0110)  +  (1001)  +  (0101)  =  (1010)  -  a 


8 


44 


3  13  3 

T,  =  0,  T^=a.  T=a  ,  T^=a^. 

Z  3  O  D 


Equations  44  now  take  the  form 


3  13  ,3 

a  =  a  (T  +  o  (7 


13  3 

a  =  a  (T 


A.  8 

+  C  cr 


3 

a  = 


a^a-  +  a^a- 
^2  ®3' 


By  reducing  these  equations  to  upper  triangular  form,  the  decoder  gets 

=  (T  +  a^(r 
®1 


10 


O'  +0" 

®2  ®3 


0  =  0. 


From  the  vanishing  of  the  third  equation,  it  learns  that  only  two  errors  actually  occurred. 

Therefore  it  sets  v  to  zero  and  solves  for  cr  and  v  ,  obtaining 
®3  ®1  ®2 


v 

e 


1 


a 


10 


Finally,  it  evaluates  the  polynomial 


2  2  10  10 
cr(X)  =  X^  +  (r  X  +  O-^  =  +  a^^X  + 

®  ®1  ®2 


14  11 

for  X  equal  to  each  of  the  nonzero  elements  of  GF(16);  o'g{X)  =  0  when  X  =  o  and  X  =  a  , 
so  that  these  are  the  two  error  locators. 


4.  5  SOLVING  FOR  THE  VALUES  OF  THE  ERASED  SYMBOLS 

Once  the  errors  have  been  located,  they  can  be  treated  as  erasures.  We  are  then 
interested  in  the  problem  of  determining  the  values  of  s  +  t  erased  symbols,  given  that 
there  are  no  errors  in  the  remaining  symbols.  To  simplify  notation,  we  consider  the 
problem  of  finding  the  dj^,  given  the  Yj^,  1  <  k  <  s,  and  t  =  0. 

Since  the  parity-check  equations  are  linear  in  the  erasure  values,  we  could  solve  s 
of  them  for  the  d.  There  is  another  approach,  however,  which  is  more  efficient. 

As  an  aid  to  understanding  the  derivation  of  the  next  equation,  imagine  the  following 
situation.  To  find  dj^  ,  suppose  we  continued  to  treat  the  remaining  s  -  1  erasures  as 
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erasures,  but  made  a  stab  at  guessing  d.  .  This  would  give  us  a  word  with  s  -  1  erasures 

and  either  one  or  (on  the  chance  of  a  correct  guess)  zero  errors.  The  rank  of  the 
matrix  would  therefore  be  either  zero  or  one;  but  would  be  simply  a  single  modi¬ 
fied  cyclic  parity  check,  formed  from  the  elementary  symmetric  fimctions  of  the  s  -  1 
remaining  erasure  locators.  Its  vanishing  woidd  therefore  tell  us  when  we  had  guessed 

d,  correctly, 
o 

To  derive  an  explicit  formula,  let  be  the  vector  of  elementary  symmetric  func- 

o 

tions  of  the  s  -  1  erasure  locators,  excluding  Yj.  .  Since  t  =  0,  we  have  from  (41) 


(m  +d-2,m  +d 
'  o  o 


m  +d-s~l 


d.Y. 


k"k 


k=l 


■k(s-l,o) 


and  therefore 


k  ’^d-s-rl  ~  k  “"d  ■  ®(m  +d-2,m  +d-s-l) 
o  o  o  c 


m  +d-s-l 


=  d.  Y. 
o  o 


k  ““d 


K.)  •  I 


m  +d-s-l 

^k^k  k  -^d^^k) 


k#k. 


m  +d-s-i 


=  d,,  Y,  ^ 
o  o 


iK) 


--0' 


since  =  0,  k  k^.  Thus 


k /^d-s-l 
o 


k_  “  m  +d-s-l  /  . 


This  gives  us  our  explicit  formula  for  dj^  ,  valid  for  any  s: 

^m  +d-2  "  k  ‘^d,^m  +d-3  k/d_^m  Td-4  "  '  * 

, _ o _ o  1  o  _ o  2  o _ 

K.  ”  m  +d-2  m.+d-3  m_+d-4 


o  1  o 


+  k/d  A 

O  2  O 


(45) 


Evidently  we  can  find  all  erasure  values  in  this  way;  each  requires  the  calcvilation 
of  the  symmetric  functions  of  a  different  s  -  1  locators.  Alternatively,  after  finding  d^^, 
we  could  modify  all  parity  checks  to  account  for  this  information  as  follows: 


^(m^+d-2,m^)  ""  ^(m^+d-2,m^)  ^l'^l(m^+d-2,  m^) 
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and  solve  for  in  terms  of  these  nev/  parity  checks  and  the  remaining  s-2  erasure  loca¬ 
tors,  and  so  forth. 

A  similar  argument  leads  to  the  formula  for  error  values. 


'^{d-s-2,d-s-t-l) 

~  m  +d-s-t-l 
*’0.^,0 


3, 


.  (T 

3  e 
■*o 


in  terms  of  the  modified  cyclic  parity  checks.  We  could  therefore  find  all  error  values 
by  this  formula,  modify  the  parity  checks  accordingly,  and  then  solve  for  the  era¬ 
sure  values  by  Eq.  45. 

a.  Example  2 

As  a  continuation  of  Example  1,  let  the  decoder  solve  for  e^.  The  elementary  sym¬ 
metric  functions  of  X^,  Yj^,  and  are 


Therefore 


4^6  10^3.  13,^6  9 

a  +  a  ’  a  +  a  a  +  a  ‘  a  a 

®1"7,6  8^3  9^6.  16"  12 

a  -f-a  a  +  a  'a  -i-o  a  a 


=  a  . 


62  can  be  found  similarly,  or  the  decoder  can  calculate 


4^8  13 


i+_4^7 


4^6 


S<  =  Sg  4  a-X- =  a-  S«  =  S^a-^X;  =  S*  =  S,  4  a-^X^  =  0. 


Since 


0-’  =  Y,Y„  =  0-'  =  Y  +  Y„  =  a, 

^  X  X  X  M 


13  3 

a  +  a  ’  a 


11 


'2"  13^  2  ^  10  6"  10 

a  +a'a+c  'a  a 


=  a. 


Also,  S”  =  a,  SJJ  =  0, 


^  12  13 

o  +  a  "a 

13 


=  1, 


13  „ 

a  J 


and,  with  Si"  =  a  ,  d^  = — a 


4.6  IMPLEMENTATION 


We  now  consider  how  a  BCH  decoder  might  be  realized  as  a  special-purpose  com¬ 
puter.  We  shall  assume  the  availability  of  an  arithmetic  unit  able  to  realize,  in 
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approximate  order  of  complexity,  the  following  fimctions  of  finite  field  elements:  addition 
(X=Xj+X2),  squaring  {X=X^),  multiplication  by  m^  <  m  <  m^  +  d  -  2  (X=G”bCj). 
Furthermore,  because  of  the  bistability  of  common  computer  elements,  we  shall  assiime 
p  =  2,  so  that  subtraction  is  equivalent  to  addition  and  squaring  is  linear.  We  let  the 

locators  Z.  =  Finally,  we  shall  assume  that  all  elements  of  the  symbol  field  are 

^  M 

converted  to  their  representations  in  the  locator  field  GF(q)  =  GF(2  ),  and  that  all  oper¬ 
ations  are  carried  out  in  the  larger  field. 

29  30 

Peterson  and  Bartee  and  Schneider  have  considered  the  implementation  of  such 
an  arithmetic  unit;  they  have  shown  that  multiplication  and  inversion,  the  two  most  dif¬ 
ficult  operations,  can  be  accomplished  serially  in  a  number  of  elementary  operations 
proportional  to  M.  All  registers  will  be  M  bits  long.  Thus  the  hardware  complexity  is 
proportional  to  some  small  power  of  the  logarithm  of  q,  which  exceeds  the  block  length. 

We  attempt  to  estimate  the  approximate  complexity  of  the  adgorithms  described  above 
by  estimating  the  number  of  multiplications  required  by  each  and  the  niunber  of  memory 
registers. 

During  the  computation,  the  received  sequence  of  S3rmbols  must  be  stored  in  some 
buffer,  awaiting  correction.  Once  the  S  and  Y,  have  been  determined,  no  further 

XXX  iv 

access  to  tliis  sequence  is  required,  until  the  sequence  is  read  out  and  corrected. 

The  calculation  of  the  parity  checks 


,  m,  m(n-l)  .  m(n-2)  , 

Sm  s  r{T  )  =  r^o  '  '  +  r^a  '  '  + 


+  r 


n 


is  accomplished  by  the  iteration 


m 


+  r. 


XXX 

which  involves  n  -  1  multiplications  by  a  .  d  -  1  such  parity  checks  must  be  formed, 
requiring  d  -  1  registers. 

can  be  calculated  at  the  same  time.  We  note  that 


“"d,  "  k  V  ^k  k  ”‘d(k-l)’ 
k  ok  o  o  '  ' 

can  be  calculated  by  this  recursion  relation  as  each  new  Yj^  is  determined.  Adding 
a  new  Yj^  requires  s'  multiplications  when  s'  are  already  determined,  so  that  the  total 
number  of  multiplications,  given  s  erasures,  is 

s  -  1  +  s  -  2  +  ...  =  (®)  <d^/Z. 

s  memory  registers  are  required  (o-^j  =  1). 

o 

The  modified  cyclic  parity  checks  T^  are  then  calculated  by  Eqs.  40.  Each  requires 

s  multiplications,  and  there  are  d  -  s  -  1  of  them,  so  that  their  calculation  requires 

s(d-s-l)  <  d“/4  multiplications  and  d  -  s  -  1  memory  registers. 

2 

Equations  44  are  then  set  up  in  t^{t^+i)  <  d  /4  memory  registers.  In  the  worst  case, 
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t  =  t^,  the  reduction  to  upper  triangular  form  of  these  equations  will  require  t^  inversions 
and 


t  {t  +1)  + 
o  o 


(t  -1) 
'  o  ' 


'o* 


. .  +  1.  2 


VD- 


multiplications-  As  d  becomes  large,  this  step  turns  out  to  be  the  most  lengthy, 
requiring  as  it  does  ~d  /24  multiplications. 

Determination  of  from  these  reduced  equations  involves,  in  the  worst  case,  a 

further  ^  <  d^  ■/8  multiplications,  and  t^  memory  registers. 

As  Chien^^  has  shown,  finding  the  roots  of  or  (X)  is  facilitated  by  use  of  the  special 
multipliers  by  a  in  the  arithmetic  unit.  If 

•4- 

t. 

^  =  0. 

j=0 


m  +t-j 

then  1  is  a  root  of  <r(X).  Let  cr*  .  ..  =  a  o-  ...  Now 

O'  '  ^/  + _ 11  _ 11 


e{t-j) 


e(t-j)' 


X 

I 

J=0 


m  Tt 


<r' =  a 


e(t-j) 


tx  ^ 

°  I 


3=0 


e(t-j) 


which  will  be  zero  when  a”^  =  is  a  root  of  o’g(X).  All  error  locators  can  therefore 
be  found  with  n  multiplications  by  o^,  and  stored  in  t  memory  registers. 

Finally,  we  have  only  the  problem  of  solving  for  s  +  t  erasures.  We  use  (45),  which 
requires  the  elementary  symmetric  functions  of  all  erasure  locators  but  one.  Since 


k  ‘^d. 
o  k 


-  v-^n- 


-  ,  <r 


k^  d(k+l) 


,-l 


we  can  begin  with  ,  v  ,,  ..  =  Y.  c  .  and  find  all ,  v  ,  from  the  v  ,  with  s  -  1  multipli- 

o  ^o  °s  o  °k  K 

cations  and  an  inversion.  Then  the  calculation  of  (45)  requires  2(s+t-l)  multiplications 

and  an  inversion.  Doing  this  s  +  t  times,  to  find  all  erasure  values,  therefore  requires 

3(s+t)(s+t-l)  multiplications  and  s  +  t  inversions.  Or  we  can  alter  s  +  t  -  1  parity  checks 

after  finding  the  value  of  the  first  erasure,  and  repeat  with  s'  =  s  +  t  -  1  and  so  forth; 

under  the  assumption  that  all  are  readily  available,  this  alternative  requires  only 

2(s+t)(s+t-l)  multiplications  and  s  +  t  inversions. 


a.  Summary 

To  summarize,  there  are  for  any  kind  of  decoding  two  steps  in  which  the  number  of 

computations  is  proportional  to  n.  If  we  restrict  ourselves  to  correcting  deletions  only, 

then  there  is  no  step  in  which  the  number  of  computations  is  proportional  to  more  than 
2 

d  ,  Otherwise,  reduction  of  the  matrix  M  requires  some  computations  that  may  be  as 
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3 

large  as  d  .  If  we  are  doing  general  minimum-distance  decoding,  then  we  may  have  to 
repeat  the  computation  d/2  times,  which  leads  to  a  total  number  of  computations  propor- 

4 

tional  to  d  .  As  for  memory,  we  also  have  two  kinds:  a  buffer  with  length  proportional 

to  n,  and  a  number  of  live  registers  proportional  to  d  .In  sum,  if  d  =  6n,  the  total  com- 

plixity  of  the  decoder  is  proportional  to  n^,  where  b  is  some  number  of  the  order  of  3. 

All  this  suggests  that  if  we  are  willing  to  use  such  a  special-purpose  computer  as  our 

decoder,  or  a  specially  programmed  general-purpose  machine,  we  can  do  quite  powerful 

decoding  without  making  the  demands  on  this  computer  unreasonable. 

32 

Bartee  and  Schneider  built  such  a  computer  for  a  (127,92)  5 -error -correcting 

33  34 

binary  BCH  code,  using  the  Peterson  algorithm.  More  recently,  Zierler  has  studied 
the  implementation  of  his  algorithm  for  the  (255,  225)  15 -error -correcting  Reed-Solomon 
code  on  GF(256),both  in  a  special-purpose  and  in  a  specially  programmed  small  general- 
purpose  computer,  with  results  that  verify  the  feasibility  of  such  decoders. 

b.  Modified  Deletions -and -Errors  Decoding 

If  a  code  has  minimum  distance  d,  up  to  s^  =  d  -  i  deletions  may  be  corrected,  or 
up  to  <  (d-l)/2  errors.  We  have  seen  that  v/hile  the  number  of  computations  in  the 
decoder  was  proportional  to  the  cube  of  t^,  it  is  proportional  only  to  the  square  of  s^. 

It  may  then  be  practical  to  make  the  probability  of  symbol  error  so  much  lower  than  that 
of  symbol  deletion  that  the  probability  of  decoding  error  is  negligibly  affected  when  the 
decoder  is  set  to  correct  only  up  to  t^  <  t^  errors.  Such  a  tactic  we  call  modified 
deletions -and-errors  decoding,  and  we  use  it  wherever  we  can  in  the  computational 


program  of  Section  VI. 


V.  EFFICIENCY  AND  COMPLEXITY 


We  shall  nov/  collect  our  major  theoretical  results  on  concatenated  codes.  We  find 
that  by  concatenating  we  can  achieve  exponential  decrease  of  probability  of  error  with 
over-all  block  length,  with  only  an  algebraic  increase  in  decoding  complexity,  for  all 
rates  below  capacity;  on  an  ideal  superchannel  .vith  a  great  many  inputs,  Reed-Solomon 
codes  can  match  the  performance  specified  by  the  coding  theorem;  and  with  two  stages 
of  concatenation  we  can  get  a  nonzero  error  exponent  at  all  rates  below  capacity, 
although  this  exponent  v/iil  be  less  than  the  unconcatenated  exponent. 

5. 1  ASYMPTOTIC  COMPLE>aTY  AND  PERFORMANCE 

We  have  previously  pointed  out  that  the  main  difficulty  with  the  coding  theorem  is  the 
complexity  of  the  decoding  schemes  required  to  achieve  the  performance  that  it  predicts. 

The  coding  theorem  establishes  precise  bounds  on  the  probability  of  error  for  block 
codes  in  terms  of  the  length  N  of  the  code  and  its  rate  R.  Informative  as  this  theorem 
is,  it  is  not  precisely  what  an  engineer  would  prefer,  namely,  the  relationship  oetween 
rate,  probability  of  error,  and  complexity.  Now  complexity  is  a  vague  term,  subsuming 
such  incommensurable  quantities  as  cost,  reliability,  and  delay,  and  often  depending  on 
details  of  implementation.  Therefore  we  should  not  expect  to  be  able  to  discover  more 
than  rough  relationships  in  this  area.  We  shall  investigate  such  relationships  in  the 
limit  of  very  complex  schemes  and  very  low  probabilities  of  error. 

We  are  interested  in  schemes  that  have  at  least  two  adjustable  parameters,  the 
rate  R  and  some  characteristic  lengtli  L,  which  for  block  codes  will  be  proportional  to 
the  block  length.  We  shall  assume  that  the  complexity  of  a  scheme  depends  primarily 
on  L.  As  L  becomes  large,  a  single  term  will  always  dominate  the  complexity.  In  the 
case  in  which  the  complexity  is  proportional  to  some  algebraic  function  of  L,  or  in  which 
different  parts  of  the  complexity  are  proportional  to  algebraic  functions  of  L,  that  part 
of  the  complexity  which  is  proportional  to  the  largest  power  of  L,  say  L°,  will  be  the 
dominant  contributor  to  the  complexity  when  L  is  large,  and  we  shall  say  the  complexity 
is  algebraic  in  L,  or  oroportional  to  L.  In  the  case  in  which  some  part  of  the  complexity 
is  proportional  to  the  exponential  of  an  algebraic  function  of  L,  this  part  becomes  pre¬ 
dominant  when  L  is  large  (since  e^  =  1  +  x  +  f2\  +  —  >  x°,  x  —  oo),  and  we  say  the 

complexity  is  exponential  in  L. 

Similarly,  the  probability  of  error  might  be  either  algebraic  or  exponential  in  L, 
though  normally  it  is  exponentially  small.  Since  what  v/e  are  really  interested  in  is  the 
relationship  between  probability  of  error  and  complexity  for  a  given  rate,  we  can  elim¬ 
inate  L  from  these  two  relationships  in  this  way:  if  complexity  is  algebraic  in  L  while 
Pr(e)  is  exponential  in  L,  Pr(e)  is  exponential  in  complexity,  while  if  both  complexity 
and  Pr(e)  are  exponential  in  L,  Pr(e)  is  only  algebraic  in  complexity. 

For  example,  the  coding  theorem  uses  maximum- likelihood  decoding  of  block  codes 
of  length  N  to  achieve  error  probability  Pr(e)  <  e  Maximum- likelihood  decoding 
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NR 

involves  e  comparisons,  so  that  the  complexity  is  also  exponential  in  N.  Therefore, 

Pr(e)  is  only  algebraic  in  the  complexity;  in  fact,  if  we  let  G  be  proportional  to  the  com- 

_  E(R)  Em 

NT?  ^  R  R 

plexity,  G  =  e  ,  (In  G}/R  =  N,  Pr(e)  <  e  =  G  .  As  we  have  previously 

noted,  this  relatively  weak  dependence  of  Pr(e)  on  the  complexity  has  retarded  practical 

application  of  the  coding  theorem. 

Sequential  decoding  of  convolutional  codes  has  attracted  interest  because  it  can  be 

shown  that  for  rates  less  than  a  critical  rate  R  <  C,  the  average  number  of  com- 

comp  ° 

putations  is  bounded,  while  the  probability  of  error  approaches  zero.  The  critical  lia¬ 
bility  of  this  approach  is  that  the  number  of  computations  needed  to  decode  a  given 
symbol  is  a  random  variable,  and  that  therefore  a  buffer  of  length  L  must  be  provided 

35 

to  store  incoming  signals  while  the  occasional  long  computation  proceeds.  Recent  work 
has  shown  that  the  probability  of  overflow  of  this  buffer,  for  a  given  speed  of  computa¬ 
tion,  is  proportional  to  L*'°,  where  a  is  not  large.  In  the  absence  of  a  feedback  channel, 
buffer  overflow  is  equivalent  to  system  failure;  thus  the  probability  of  such  failure  is 
only  algebraically  dependent  upon  the  length  of  the  buffer  and  hence  on  complexity. 

Threshold  decoding  is  another  simple  scheme  for  decoding  short  convolutional  codes, 
but  it  has  no  asymptotic  performance.  As  we  have  seen,  BCH  codes  are  subject  to  the 
same  asymptotic  deficiency.  The  only  purely  algebraic  code  discovered  thus  far  that 
achieves  arbitrarily  low  probability  of  error  at  a  finite  rate  is  Elias'  scheme  of  iterating 

codes  ;  but  this  rate  is  low. 
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Ziv  has  shown  that  by  a  three- stage  concatenated  code  over  a  memoryless  channel, 
a  probability  of  error  bounded  by 

-L'  ^ 

Pr(e)  <  K 

can  be  achieved,  where  L  is  the  total  block  length,  while  the  number  of  computations 
required  is  proportional  to  L°.  His  result  holds  for  all  rates  less  than  the  capacity  of 
the  original  channel,  although  as  R  —  C,  a  —  oo. 

In  the  sequel  v/e  shall  show  that  by  concatenating  an  arbitrarily  large  number  of 
stages:  of  RS  codes  with  suitably  chosen  parameters  on  a  memoryless  channel,  the  over¬ 
all  probability  of  error  can  be  bounded  by 

T  (1-A) 

Pr(e)  <  p^ 

where  L  is  proportional  to  the  total  block  length,  and  A  is  as  small  as  desired,  but  posi¬ 
tive.  At  the  same  time,  if  the  complexity  of  the  decoder  for  an  RS  code  of  length  n  is 

b  b 

proportional  to  n  ,  say,  the  complexity  of  the  entire  decoder  is  proportional  to  L°.  From 

the  discussion  in  Section  IV,  we  know  that  b  is  approximately  3.  This  result  will  obtain 

for  all  rates  less  than  capacity. 

We  need  a  few  lemmas  to  start.  First,  we  observe  chat  since  a  Reed-Solomon  code 
of  length  n  and  dimensionless  rate  (1-2(3)  can  correct  up  to  np  errors,  on  a  superchannel 
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with  probability  of  error  p-, 

Pr(e)  «  (^p)p"^  «  ^-n[-piogp-3C(p)) 

Here,  we  have  used  a  union  bounu  and 

(np)  * 

This  is  a  very  weak  bound,  but  enough  to  show  that  the  probability  of  error  could  be  made 
to  decrease  exponentially  with  n  for  any  p  such  th?t  -p  log  p  -  3C(P)  >0  if  it  were  pos¬ 
sible  to  construct  an  arbitrarily  long  Reed-Solomon  code.  In  fact,  however,  if  there  are 
q  inputs  to  the  superchannel,  with  q  a  prime  power,  n  <  q  -  1 .  We  shall  ignore  the 
prime  power  requirement  and  the  'minus  one'  as  trivial. 

It  is  easily  verified  that  for  p  <  l/2, 

-p  In  p  >  -d-p)  In  (l-p). 

Therefore 

-2p  In  p  >  atC(p)  >  -p  In  p,  p  <  1/2. 

1 

Now  we  can  show  that  when  (-In  p)  <  (2a)  , 

3C(p^)  3fC^(p) 

For,  by  (47), 

3C(p®)  <  -2p^  In  p^  =  p®  •  2a(-lnp) 

3C^(P)  >  p®(-lnp)®; 

but 

1 

2ax  «  when  x  >  (2a)®“^ 

which  proves  Eq.  48.  We  note  that  when  p  <  l/e  ,  a  >  4,  this  condition  is  always  satis¬ 
fied.  (In  fact,  by  changing  the  base  of  the  logarithm,  we  can  prove  a  similar  lemma  for 
any  p  <  1 ,  a  >  1 .) 

Finally,  when  x  >  y  >  0,  and  a  >  I, 

(x-y)>  =  *“(l  >  x“(l  4)>  ■‘•(l  4j  -  -  y'- 

We  are  now  ready  to  construct  our  many-stage  concatenated  code.  Suppose  by  some 
block- coding  scheme  or  otherwise  we  have  achieved  a  super  channel  with  inputs  and 

outputs  and  a  probability  of  error 


(47) 


(48) 
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E>  1 


(50) 


Pr(e)  <  Pq  =  e“®. 

We  now  apifly  to  tMs  superchannel  an  RS  code  of  dimensionless  rate  (l-Zp)  and  length 
achieving  a  probability  of  error,  from  (46),  of 

-NjpE-3e(P)]  -E- 

Prj(e)«e  ^  =e  \  (51) 


Assume.  pE  -  3C(p)  >  0,  and  define  a  to  satisfy 
Nj[pE-aC(p)]  =  Ej  s  E^; 

thus 

In  Nj  In  [pE-3e(p)] 

^  "In^  We  • 


(52) 


We  assume  that 

P  <  1/e^  (53) 

and 

4  a  <  Nj(l-2p), 

and  we  shall  prove  the  theorem  only  for  these  conditions. 

N.(1-2P) 

This  first  concatenation  creates  a  new  superchannel  with  inputs  and 

outputs  and  Pr(e)  <  exp  -Ej.  Apply  a  second  RS  code  to  this  new  superchannel  of 
length  and  dimensionless  rate  (l-2p^.  (That  a  code  of  this  length  exists  is 

guaranteed  by  the  condition  of  Eq.  53  that  a  <  Nj(l-2p).)  For  this  code, 

-N2[p®E, -3e{p^]  -E 

Pr(e)  <  e  ^  =  e  (54) 

But  now 

^2  =  =  Nj[p^E^-3C(p^)] 

>  Nj[p^E^-3C®(P)] 

>  Nj[pE-5e(P)f 

=  E®.  (55) 

Here,  we  have  used  the  inequalities  of  (48)  and  (49). 

Thus  by  this  second  concatenation  we  achieve  a  code  which,  in  terms  of  trans- 

01*1*1 

missioiis  over  the  original  superchannel,  has  length  ,  dimensionless 
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rate  (l-2p)(l-2p^,  and  Pr(e)  <  exp-E®’  . 

Obviously,  if  p  <  l/e^,  then  p^  <  l/e^,  and  if  a  <  Nj(l-2p),  then  a  <  N2(l-2p^. 
Therefore  if  we  continue  with  any  number  of  concatenations  in  this  way,  (53)  remains 
satisfied,  and  relations  like  Eq,  55  obtain  between  any  two  successive  exponents.  After 

n  such  concatenations,  we  have  a  code  of  dimensionless  rate  {l-2p)(l-4;p^  . .  .(l--2p^  ) 

length  L  =  Nj  ,  and  Pr(e)  <  exp-E^  .  Now,  for  a  ^  2,  p  <  1/2, 


(l-2p)(l-2p^ 


(l-2p)(l-2pn 


=  i  -  2p  -  2p^  +  4p^  -  2p'*  +  . . . 

5^  1  -  2p  -  4p^  -  8p^  -  IPp'^  -  . 

/  1  \  1  -  4p 

“  ^  "  ^P(l-2p/  “  1  -  2p* 


Also, 


-^InN,  =lnL,  a""  =  1  +  (a-1).^ 


so  that 


(a-1)- 


Pr(e)  ^  e“^  =  e 


-E-E 


(a— 


(1-A) 


by  substitution  for  a,  where  A  is  defined  by 


A  =  -• 


(  3C(p)\ 


Since  pE  -  3C(p)  is  assiuned  positive,  but  p  <  1,  A  is  positive. 

We  now  construct  a  concatenated  code  of  rate  R'  ^  C(l-e)  for  a  memoryless  channel 

6-€  1-2P 

with  error  exponent  E(R).  Choose  R  =  (1-6)C  >  R'  and  p  - - —  so  that  - tq-R  = 

2{1+6-€)  ^ 

C{l-e).  We  know  there  is  some  block  code  of  length  N  and  rate  R  such  that  Pr(e)  < 
exp-NE(R).  Now  we  can  apply  the  concatenation  scheme  already  described  with 
Nj  =  exp  NR,  E  =  NE(R),  as  long  as 


„„  ln[pNE(R)-3C(P)] 

4  a  =  — - + - <  e‘^^{l-2p). 

InNE(R)  InNE(R) 


It  is  obvious  that  there  is  an  N  large  enough  so  that  this  is  true.  Using  this  N,  we 
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i  -  2S 


£chi«tre  a  scheice  v/ith  late  greats.^  taai.  or  ecnal 
it,v  of  error 


t*o 


t  4fJ 


R  =  C(i-€) 


and  with  probabil- 


?rie)  «  p 


-NE(H)  -L' 


*1-/ 


r  (1-^) 


^  Pr 


1  ‘m§)  j 

It.  jP - - - I 

^  t  ^(R)J 
NR 


Clsarly,  as  long  as  E{R)  >  0,  A  can  be  madt?  as  fimall  as  desired  by  letting  N  be  suf¬ 
ficiently  large.  It  remains  positive,  however,  ao  t'lat  the  error  exponent  E  defined  by 

E  ?  iim  -  “  log  Pr{e) 

L— »  ^ 


.! 

I 

1 

1 

j 


appears  to  go  to  aero,  if  this  bound  is  tight. 

That  E  must  be  aero  when  &n  arbitrarily  large  number  of  minimum-distance  codes 
are  conoc.l  ':''''.t"'d  can  be  shown  by  the  following  simple  lower  bound.  Suppose  a  code  of 
length  N  can  correct  up  to  Np  ^'crs:  since  the  minmium  distance  cannot  exceed  N, 
P  <  1/2.  Then  on  a  channel  with  symbol  probability  ci  •?rror  p,  a  decoding  error  will 
certainly  be  made  jlf  the  first  Np  symbols  are  in  error,  so  that 

Pr(e)  > 

Concatenating  a  large  number  of  such  codes,  we  obtain 


Pr(e)  >  p 


(NjN2...)(Pip2---^ 


Now  NjN2.  . .  =  L,  the  total  block  length,  so  that 


E  =  lim 
L-*oo 


^log  Pr(e)  <  (-logp^)  limiPjP,.. .)  =  0, 


because  Pj|^  <  1/2.  Since  E  cannot  be  less  than  zero,  it  must  actually  be  zero.  In  other 
words,  by  concatenating  an  infinite  number  of  RS  codes,  we  can  approach  as  close  to  a 
nonzero  error  exponent  as  we  wish,  for  any  rate  less  than  capacity,  but  we  can  never 
actually  get  one. 

As  was  shown  in  Section  III,  decoding  up  to  t  errors  with  an  RS  code  requires  a  num- 
ber  of  computatione  .propo.-tional  to  t  .  We  require  only  that  the  complexity  of  a  decoder 
which  can  correct  up  to  Np  errors  be  algebraic  in  N,  or  proportional  to  N^,  although  in 
fact  it  appears  that  b  ~  3.  After  going  to  n  stages  of  concatenation  according  to  the 

scheme  above,  the  outermost  decoder  must  correct  (N  p)  errors,  the  next  outer- 

an-»2  ^ 

most  (NjP)  ,  and  so  forth.  But  in  each  complete  block,  the  outermost  decoder  need 
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i,. 


■W<JS« 


M  WW'WW 


^  A 

oaJv  oace,  A'hile  the  outermost  decoder  must  compute  Nf  times  j  the 

•u-i  n-2 

rv;  it  outermost  N,  times,  and  so  forth.  Kenca  the  total  noraber  of  computations 

i  X 

ib  proportions!  to 

f  n-l]^  n-if  n-2'  t>l.  n-2. 

-j-Nj  NN-P)’^  {{Njpr  i  4-... 

.  ri-l  .^o, -2  r.-l  :i-2,.  n-3 

«  +  Nj  +  Nf  + _ 

Sirice  ba  >  b  +  a,  a  >  Z,  b  >  2.  the  fir  j-.  term  m  this  aeries  iloniiimnt.  finally,  since 


G  *;  L> . 

Thus  the  number  o.f  computations  can  incT-eaee  only  ass  small  power  of  L,.  The  com¬ 
plexity  of  the  hardware  required  to  implement  these  computations  is  also  inc  .'easing,  but 
generally  only  in  proportion  to  a  powe.r  of  log  L. 

This  result  is  not  to  be  taken  as  a  guide  to  design;  in  practice  one  finds  it  unnecces- 
sary  to  concatenate  a  large  number  of  codes,  as  two  stages  generally  euffice.  It  does 
indicate  that  concatirnation.  is  a  powerful  tool  for  getting  exponentially  small  probabilities 
of  error  without  an  exponentially  large  decoder. 

5.2  CODirsG  THEOREM  ii’OR  ID15AL  SUPEP.CHANNELS 

We  recall  that  an  ideal  superchannei  is  ll;c  Cj-O'itput  mernorylefia  c'.'xonel 

which  is  symmetric  from  the  input  and  the  output  and  has  equipiccr’^’le  c-i'rors,  if  lis 
total  probability  of  error  is  p,  its  transition  probability  matrix  is 

fd-p),  i  =  j 


We  shall  now  calculate  the  unexpurgated  part  of  the  coding  theorem  bound  for  this 
channel,  in  the  limit  as  q  becomes  very  large.  The  result  will  tell  us  how  well  we  can 
hope  to  do  with  any  code  when  we  assume  we  are  dealing  with  an  ideal  superchannei. 
Then  we  shall  find  that  over  an  interesting  range  Reed-*Solomon  codes  are  capable  of 
achieving  this  standard.  Finally,  we  shall  use  these  results  to  compute  performance 
bounds  for  concatenated  codes. 

Specialized  to  a  symmetric  discrete  memoryless  channel,  the  coding  theorem  asserts 
that  there  exists  a  code  of  length  n  and  rate  R  which  with  maximum- likelihood  decoding 
will  yield  a  probability  of  error  bounded  by 


Pr(e)  <  e 


■nE(R) 
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where 


E(R)  =  maic  (e  {p)-pR} 
0<p:ei  * 


Eoip)  = 


q  P  q  > 

\  *  \  1 


iU  >  /L-  <1 

Li=l 


Substituting  Eq.  59  lo  Elq.  6- .  we  obtain  for  the  ideal  superchannel 


E^{p)  =  .-in  q'P  [{i-p)  +  (q-1)  ^■‘■P  p^'*‘pj  (62 

To  facilitate  handling  Eq.  62  when  q  becomes  large,  we  substitute  p*  =  p  In  q  and 
the  dimensionless  rate  r  =  Tt/ln  q;  then 

Pr{e)  ^  (63 

E(r}  =  max 

0<p’sSinq 


Inq  p*  Inq 

EJ,(p')  =  -to  a-P'  (l-slnfl  +  P’  +  (q.i)l“q  +  P’  plnq+p' 


We  consldei''  first  the  c'  ■=>?  in  which  p  is  fixed,  while  q  becomes  very  large.  For  p’  >  0, 


E^p*)  becomes 

^o^r')  -(i  •p)+peP'] 

=  p’  -  In  [(l-p)+peP']. 

In  the  maximization  of  E{r),  p'  can  now  be  as  large  as  desired,  so  that  the  curved, 
xmexpurgated  part  of  the  coding  theorem  bound  is  the  entire  boimd;  by  setting  the  deriv¬ 
ative  of  E{r)  to  zero,  we  obtain 

'•=-^Vp'' 


=  1  - 


1  -p 


(l-p)  +  peP'  (l-p)  +  peP* 


p.  1  -  P 
- - 


1  -  r 
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Thus, 


E(r)  =  (l-r)  In- 


1  -  p 


1  -  r 


1  -  p 


=  -  r  In  (1-p)  -  (1-r)  In  p  -  3C{r).  (64) 

This  bound  will  be  recognized  as  equal  to  the  Chernoff  bound  —  to  the  probability  of 
getting  greater  than  n(l~r)  errors  in  n  transmissions,  when  the  probability  of  error  on 
any  transmission  is  p.  It  suggests  that  a  maximum- likelihood  decoder  for  a  good  code 
corrects  all  patterns  of  n{l-r)  or  fewer  errors. 

On  the  other  hand,  a  code  capable  of  correcting  all  patterns  of  n{l-r)  or  fewer  errors 
must  have  minimum  distance  2n(l-r),  thus  at  least  2n(l-r)  check  symbols,  and  dimen¬ 
sionless  rate  r'  =  1  -  2{l-r)  <  r.  No  code  of  dimensionless  rate  r  can  correct  all  pat¬ 
terns  of  n(l-r)  or  fewer  errors.  What  must  happen  is  that  a  good  code  corrects  the  great 
majority  of  error  patterns  beyond  its  minimum  distance,  out  to  n(l-r)  errors. 

We  shall  show  that  on  an  ideal  super  channel  with  q  very  large,  Reed- Solomon  codes 
do  just  about  this,  and  come  arbitrarily  close  to  matching  the  performance  of  the  coding 
theorem. 

One  way  of  approximating  an  ideal  superchannel  is  to  use  a  block  code  and  decoder  of 

NR 

length  N  and  rate  R  over  a  raw  chaimel  with  error  exponent  E(R);  then  with  e  inputs 

we  have  Pr{e)  <  We  are  thus  interested  in  the  case  in  which 


q  =  e 


p  =  e 

Substituting  Eqs.  65  in  Eqs.  63,  and  using  p'  =  p  In  q  =  pNR,  we  obtain 
Pr(e)  < 

E(r)  =  max  {e  (p)-pNHr} 

0<p<l 


E„(,)  =  -in  e-oNKbl-e-”®)'"'  *  (e'""-!)'"’'  e'*"" 


When  N  becomes  large,  one  or  the  other  of  the  two  terms  within  the  brackets  in  this  last 
equation  dominates,  and  E^(p)  becomes 

f  pNR,  pNR  <  NE 
E  (p)  =  < 

°  1  NE,  NE  ^  pNR, 


E^(p)  =  N  min{pR,  e). 
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The  maximization  of  E{r)  in  (66}  is  achieved  by  setting  p  =  E/R  if  E/R  <  1 ,  and  p  =  1 
otherwise.  Thus 

f  NE(l-r)  E  R 

S(r)  -  >{ 

tNR(l-r)  E  >  R 

or 

E{r)  =  N(l~r)  min{E,R}.  (68) 

In  the  next  section  we  shall  only  be  interested  in  the  case  E  <  R,  which  corresponds  to 
the  curved  portion  of  the  boimd,  for  which  we  have 


Pr(e)  S 


(69) 


5.3  PERFORMANCE  OF  RS  CODES  ON  THE  IDEAL  SUPERCHANNEL 


We  shall  show  that  on  an  ideal  superchannel  (which  suits  RS  codes  perfectly) ,  RS 
codes  are  capable  of  matching  arbitrarily  closely  the  coding  theorem  boxmds,  Eqs.  51 
and  69,  as  long  as  q  is  sufficiently  large.  From  these  results  we  infer  that  RS  codes 
are  as  good  as  any  whenever  we  are  content  to  treat  the  superchannel  as  ideal. 

a.  Maximvim- Likelihood  Decoding 

We  shall  first  investigate  the  performance  of  RS  codes  on  a  superchannel  with  large 
q  and  fixed  p,  for  which  we  have  shown  (Eq.  51)  that  there  exists  a  code  with 

Pr(e)  «  Lip-rln(l-p)  -3C(r)3^ 

It  will  be  stated  precisely  in  the  following  theorem. 

THEOREM;-  For  any  r  >  l/Z,  any  6  such  that  l/4  >  6  >  0,  and  any  p  such  that 
1/4  >  p  >  0,  there  exists  a  number  Q  such  that  for  all  ideal  superchannels  with  proba¬ 
bility  of  error  p  and  q  >  Q  inputs,  use  of  a  Reed-Solomon  code  of  length  n  <  q  -  1  and 
dimensionless  rate  r  with  maximum- likelihood  decoding  will  result  in  a  probability  of 
error  boimded  by 

Pr(e)  =5  3e-n[-(l-r)lnp-rln(l-p)-3C{r)-6]^ 

PROOF;  Let  P.  be  the  probability  that  a  decoding  error  is  made,  given  i  symbol 
errors.  Then 

n 

Pr(e)  =  ^  P.(”)  p"(l-p)""\ 
i=0 

The  idea  of  the  proof  is  to  find  a  bound  for  P^  which  is  less  than  one  for  i  <  t,  and  then 
to  spli\  this  series  into  two  parts. 
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(70) 


i 

I 


1 


Pr(e)  =  1  Pj(“)  p‘(l-p)“-‘  +  f  (^)  p'(l-p)“-\ 
i=0  i=t+l 

in  which,  because  falls  off  rapidly  with  decreasing  i,  the  dominating  term  in  the  first 
series  is  the  last,  while  that  in  the  second  series  is  the  first. 

We  first  boimd  for  i  <  d  -  1 .  Consider  a  single  code  word  of  weight  w.  By 
changing  einy  k  of  its  nonzero  elements  to  zeros,  any  m  of  its  nonzero  elements  to  any 
of  the  other  (q-2)  nonzero  field  elements,  and  any  1  of  its  zero  elements  to  any  of  the 
(q-1)  nonzero  field  elements,  we  create  a  word  of  weight  i  =  w  +  1  -  k,  and  at  distance 
j  =  k  +  1  +  m  from  the  code  word.  The  total  number  of  words  that  can  be  so  formed  is 

(kJ  ("D 

Here,  the  notation  indicates  the  trinomial  coefficient 

w! 

k !  m !  (w-m-k) ! 

which  is  the  total  number  of  ways  a  set  containing  w  elements  can  be  separated  into  sub¬ 
sets  of  k,  m,  and  (w-m-k)  elements.  The  total  number,  N,  of  words  of  weight  i  and 
distance  j  from  some  code  word  is  then  upperboxmded  by 

w,k,  £,  m 

i=w+j2.-k 

j=k+£-m 

where  N  is  the  total  number  of  code  words  of  weight  w.  The  reason  that  this  is  an 
w 

upper  boxmd  is  that  some  words  of  weight  i  may  be  distance  j  from  two  or  more  code 
words. 

We  have  shown  (see  Section  III)  that  for  a  Reed-Solomon  code. 

Substituting  this  expression  in  (71)  and  letting  k  =  j-  l-m,  w=i  +  j-  m-21,  we 
obtain 


m>0 


=  I  I 


I  /  ->\i^  /  iii+j-m-i-d+l 

nl  (q-2)  (q-1) 

m!£!(j-£-m)!  (i-£-m)  I  (n-i-j+m+£) ! 


(72) 
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A  more  precise  specification  of  the  ranges  of  m  and  1  is  not  necessary  for  our 
purposes. 

i’ll  'fH 

The  ratio  of  the  (1+1)“^  to  the  term  in  this  series,  for  a  given  m, 

(q-1)”^  (i-J^-m) 

(£+1)  {n-i-j+m+£+l) 


is  upperbounded  by 
(d-1)^  (q-l)"^ 


n^(l-r)^ 


(1-r)' 


(£+1)  [n-2{d-l)]  (£+1)  (q-1)  n{2r-l)  (£+1)  (2r-l) 

Here,  we  have  used  r >  1/2,  j<i<d-l  =  n{l-r),  1^0,  m^ 0,  and  n<q  - 1.  Defining 


C,  = 


1  “  2r  -  1 ' 


we  have 


,  -  -.m  ,  ,.i+j-m~d+l  o£ 

^  n!  (q-2)  (q-1)  ^  V 

N..4S  >  -  y  -ji- 

i&io  *  (n-i-j+m) ! 


C,  ^  nl  (4-2)”  (9-1)'^^-“-“*" 
.e  2^  - 


ml  (j-m)  I  (i-m)  I  (n-i-j+m)  I 

Similarly,  the  ratio  of  the  (m+1)^^  to  the  m*^  term  in  the  series  of  (73), 
(q-2)  (j-m)  (i-m) 


(73) 


(q-1)  (m+1)  (n-i-j+m+1)  ’ 


is  upperboimded  by 
(d-1)^ 


nC 


1 


(m+1)  [n-2(d-l)]  (m+1) 


so  that 


C.  n!  (q-l)^'^j-'^'^^  V 

N  <  e  ^ -  )  - - 

j!i!(n-H)!  ^ 

=  e‘=‘'""'(n)(,-i)-^-. 


m 


(74) 


Since  the  total  number  of  i-weight  words  is 
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. . 


(“) 


the  probability  that  a  randomly  chosen  word  of  weight  i  will  be  distance  j  from  some 
code  word  is  boimded  by 

and  the  total  probability  that  a  word  of  weight  i  will  be  distance  j  <  i  from  some  code 
word  is  bounded  by 


<  e 


Cj(n+1)  Y  {n-i)!(q-l) 


j+l-d 


j  I  (n-i-j) ! 


or,  if  we  substitute  j'  =  i  -  jt 


C,{n+1)  Y'  (n-i)!  (q-l)^"^''^^*^ 

P.  <  e  /  - . 

^  j‘^0  tH')!  (n-Zi+j*)! 

The  ratio  of  the  (j'+l)*^  to  the  term  in  the  series  of  (75), 

(H') 

(q-1)  (n-2i+j'+l) ' 
is  upoerbounded  by 

C.  d-l  - 

^  {q-D  [n-2(d-l)]  (q-l)(2r-l)’ 


so  that 


P.  <  e 
1 


Cj(n+1)  (n-i)  I  (q-1) 


i+l-d 


i!  (n-2i)! 


I  4- 


(l-r) 

so  that  C,  ^  l/2, 


/n-i\  i 
Fi<e  ^  (  i  ) 


i+l-d 


(“rO 


i+l-d 


Substituting  (77)  in  (70),  we  obtain 


Pr(e)  ^  ^  (''p)  (^)  p^l-p)“"^  +  ^  (^)  p^l-p)"""^ 


i=t+l 


We  let 


e.Az^>0. 


so  that  t  =  n(l-r-€).  The  second  series  of  (78)  is  just  the  probability  that  more  than  t 
errors  occur,  which  is  Chemoff-botmded  by 

a-r-€)>p.  (79) 

(If  €<6,  l-r-€>  1/4  >  p.)  Setting  i‘  =  t  -  i,  we  write  the  first  series  of  (78)  as 

r-  /«xi\  /«  i»t+l-d-i‘  „t-i'..  .n-t+i’ 

C,(n+1)  ^  n!  (q-1)  p  (1-p) 

S,  =2e  /  - .  (80) 

i'^0  ‘  ^  (n-2t+2i«) ! 

The  ratio  of  the  (i*+l)^  to  the  i*^  term  in  the  series  of  Eq.  80, 

(1-p)  (t-i')^ 

p(q-l)  (n-2t+2i'+l)  (n-2t+2i‘+2)  ’ 


is  upperbounded  by 


C3  = 


(1-p)  (d-1)' 


(l-p)(l-r)' 


p(q-l)  [n-2(d-l)]2  p(q-l)(2r-l)^ 


so  that 


Si<2e 


Cj(n+1)  nMq-l)^^^p'(l-p)""^ 
t!  t!  (n-2t)l 


1  -  p  (1-r)^ 

q_l  >2-- - ^ 

P  (2r-l)^ 
so  that  C-  ^  1/2, 


C  (n+l)  nf  (q-l)^"^^*^  p^l-p)”"^ 
<  4p  ^  - 


Sj  <  4e 


tf  t!  (n-2t)I 


Substituting  P.  from  (77)  in  (82),  we  obtain 
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Si  «  t)  p‘(»-p>“"*- 


(83) 


Substituting  (83)  in  (78),  with  the  use  of 


<  pn3e(t/n) 


(") 


we  have 


Pr(e)  <  (2P^+1)  e 


Choose 


-n[-  (l-r-£)  Inp - (r+e)  In (1-p)  -3C(r+£)] 


£  = 


(84) 


In  (1-p)  -  Inp 

since  p  <  l/4,  £  <  6,  and  Eq.  79  is  valid.  Since 
3e(r+£)  ^3C(r),  r  5^  l/2, 

Pr(e)  S  (2Pj+l)  e-"t-(l-r)lnp-rln(l-p)-3C(r)] 
Finally,  for  this  choice  of  £,  from  (77), 

n[C,  +l-£ln(q-l)]+[C,  +ln2] 

<  e  ^  '■ 

in  which  we  have  used  d  -  1  -  t  =  n£  and 


(  t  )  *  «  ^  «  e“. 


Thus  ^  1  if 


r  C,  +  In  2“ 

In  (q-1)  ^j[Pi  +  1  - ■_ 


In  (1-p)  -  In  p 

^ - g - [2Cj+  l  +  ln2]. 


(85) 


(86) 


in  which  we  have  used  n^  1  and  substituted  for  £  by  Eq.  84.  Defining  =  2C ^  +  1  -  In  2, 
(84)  can  be  written 


-pr4/« 


1  -  p 


q-  1  ^ 

When  this  is  satisfied, 

Pr(e)  <  3e-n[-(l-r)lnp-rln(l-p)-3C(r)-6] 


65 


(87) 


(88) 


.. 


as  was  to  be  proved.  Equation  88  holds  if  (76),  (81),  and  (87)  are  simultaneously  satis¬ 
fied,  which  is  to  say  if  q  -  1  >  Q,  with 


Q  =  max  "SZ 


(l-r)  l-p(l-r)^ 


Q.  E.D. 


From  this  result  we  can  derive  a  corollary  that  applies  to  the  caise  in  which  q  =  e  , 
— NE 

p  =  e  ,  for  which  we  have  found  the  coding  theorem  bound,  when  £  <  R  (Eq.  69), 


Pr(€)  < 

COROLLARY:  For  E  <  R,  r  >  1/2,  and  any  6*  >  0,  there  exists  an  such  that  for 

all  N  >  N^,  use  of  a  Recd-Solomon  code  of  dimensionless  rate  i  and  length  n  ^  q  -  1 

with  maximum-likelihood  decoding  on  an  ideal  superchannel  with  probability  of  error 
— NE  NR 

p-  e  and  q  =  e  inputs  will  yield  an  over-all  probability  of  error  bounded  by 
Pr(£)S3e-^tE(l-r)-5’]. 

Proof:  The  proof  follows  immediately  from  the  previous  theorem  if  we  let  6  = 
N6'  -  3C(r),  which  will  be  positive  for 


For  then,  since  -r  In  (1-p)  ^  0, 

Pr(e)  <  3g-n(l-r)NE+nN6' 

which  was  to  be  proved.  Equation  91  holds  if  Eq.  90  holds  and  if,  by  sxibstituting  in 
Eq.  89, 


-NE 


NR  l-r  l-e-“® 

^  e  (2r-l)^ 


The  first  condition  of  (92)  is  satisfied  if 


the  second,  if 


(1-r)^ 

NR  5:  NE  +  in  2 - : 

{2r-l)^ 


in  which  we  have  used  1  -  e 


<  1 .  Equation  94  can  be  rewritten 


(95) 


(1-r)^ 

ln2 - - 

(2r-l)^ 

^  ^  R  -  E  • 

Here,  we  assume  R  >  E. 

The  third  condition  of  (92)  is  satisfied  if 


NR  >  NE 


"[n6'  -3e{r)J 


which  can  be  rewritten 


EC^/R  +  3C(r) 


Equations  90,  93,  95,  and  97  will  be  simultaneously  satisfied  if  N  >  N^,  where 


N  =  max 
o 


to)  j  (1-r)  j  (l-r)^  EC^  +  R3C(r)1 

^6'  *  R^^2r-1’  R  -  E^^  ^2r-i)^’  J 


Q.E.D. 


This  result  then  provides  something  for  communication  theory  which  was  lacking  pre> 
viously:  a  limited  variety  of  combinations  of  very  long  codes  and  channels  which  approx¬ 
imate  the  performance  promised  by  the  coding  theorem. 

For  our  present  interest,  this  result  tells  us  that  cnce  we  have  decided  to  concatenate 
and  to  treat  errors  in  the  superchannel  as  equiprobable,  a  Reed-Solomon  code  is  entirely 
satisfactory  as  an  outer  code.  If  we  fail  to  meet  coding-theorem  standards  of  perform¬ 
ance,  it  is  because  we  choose  to  use  minimum-distance  rather  than  maximum- likelihood 
decoding. 


b.  Minimum-Distance  Decoding 

If  we  use  minimum- distance  decoding,  decoding  errors  occur  when  there  are  d/2  = 
n(l-r)/2  or  more  symbol  errors,  so  by  the  Chemoff  bc\md 


Pr(e)  <  e 


One  way  of  interpreting  this  is  that  we  need  twice  as  much  redundancy  for  minimum- 
distance  decoding  as  for  maximum- likelihood  decoding.  Or,  for  a  particular  dimension¬ 
less  rate  r,  we  suffer  a  loss  of  a  factor  K  in  the  error  exponen.'  ,  where  K  goes  to  2  when 

NR  — NE 

p  is  very  small,  and  is  greater  than  2  otherwise.  Indeed,  when  q  =  e  ,  p  =  e  ,  and 
E  <  R,  the  loss  in  the  exponent  is  exactly  a  factor  of  two,  for  (98)  becomes 


Pr(e) 


<  -nNE(l-r)/2 


67 


5.4  EFFICIENCY  OF  TWO-STAGE  CONCATENATION 


f 

\ 

t 


I 

■4 


By  the  coding  theorem,  we  know  that  for  any  memoryless  channel  there  is  a  code  of 

length  N'  and  rate  R'  such  that  Pr(e)  <  e  ^  ,  where  E(R')  is  the  error  exponent  of 

the  channel.  We  shall  now  show  that  over  this  same  channel  there  exists  an  inner  code 

of  length  N  and  rate  R  and  an  outer  code  of  length  n  and  dimensionless  rate  r,  with 

-N*E^{R') 

nN  =  N*  and  rR  =  R',  which  when  concatenated  yield  Pr{e)  ^  e  ,  We  define  the 

efficiency  tiCR*)  =  E^(R')/E(R');  then,  to  the  accuracy  of  the  bound,  the  reciprocal  of  the 

efficiency  indicates  how  much  greater  the  over-all  length  of  the  concatenated  code  must 

be  than  that  of  a  single  code  to  achieve  the  same  performance,  and  thereby  measures 

the  sacrifice  involved  in  going  to  concatenation. 

For  the  moment,  we  consider  only  the  unexpurgated  part  of  the  coding-theorem 

bound,  both  for  the  raw  channel  and  for  the  superchannel,  and  we  assume  that  the  inner 

decoder  forwards  no  reliability  information  with  its  choice.  Then  there  exists  a  code 

NR 

of  length  N  and  rate  R  for  the  raw  channel  such  that  the  superchannel  will  have  e 
NR 

inputs,  e  outputs,  and  a  transition  probability  matrix  p..  for  which 


Pr{e)  =  e~^^  ^  ^  Pji  (99 

i  j^i 

5 

Applying  the  unexpurgated  part  of  the  coding  theorem  boimd  to  this  superchannel, 

we  can  assert  the  existence  of  a  code  of  length  n  and  dimensionless  rate  r  (thus 
NR 

rate  r  In  (e  )  =  rNR)  which  satisfies 

-nE(r,  p. .) 

Pr(e)  <  e  , 

where 


riax 

E(r,P  j)»  P  {E  {P.p  )-pr"®} 

J  0<P«1  P  J 

and 

j 

We  cannot  proceed  with  the  computation,  since  we  know  no  more  about  the  matrix  p^. 
than  is  implied  by  Eq.  99-  We  shall  now  show,  however,  that  of  all  transition  probabil¬ 
ity  matrices  satisfying  (99).  none  has  smaller  E(r,  p..)  than  the  matrix  p..  defined  by 

i=  j 
j 


p..  =  ■< 


1  -  e-NE(R), 

-NE(R) 


eNR-1 


y  RpK^p 

L  r 


i-rp 
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NR 

which  is  the  transition  probability  matrix  of  the  ideal  superchannel  with  e  *  inputs,  and 
Pr(e)  =  In  this  sense,  the  ideal  superchannel  is  quite  the  opposite  of  ideal.  (In 

a  sense,  for  a  fixed  over-all  probability  of  symbol  error,  the  ideal  superchannel  is  the 
minimax  strategy  of  nature,  while  the  assumption  of  an  ideal  superchannel  is  the  corre¬ 
sponding  minimax  strategy  for  the  engineer.) 

First,  we  need  the  following  lemma,  which  proves  the  convexity  of  Ep{P,  )  over  the 
convex  space  of  all  transition  probability  matrices. 

LEMMA:  If  p..  and  q..  are  two  probability  matrices  of  the  same  dimensionality,  for 

0  1, 

XEp(P,p^-)  +  (1-X)  Ep(P,qj.)  ^Ep(P,XPj-+(l-X)q»). 


PROOF:  The  left-hand  side  of  the  inequality  is 


XEp(P,  p..)  +  (1-X)  Ep(P,  q..)  =  -X  In  2  ^  P.pj+P 


J  Li 


-il+p 


-  (1-X)  In  ;  I  ;  P-qJ+P 


j  Li 


1+P 


/  •  V*'’ 

X 

/  1  v+p 

=  -In 

y  fyp.p'.^p) 

y  fypiqH 

J  /  _ 

li  V  /  J 

Ji 

ni-x 


s  -InL, 


while  the  right  is 


Ep(P.  Xp»+(i-X)qj.)  =  -In  ^ 


3  Li 


1 

1  Pi(XPj,+(l-X)q<i)‘^'’ 


1+P 


S  -In  R. 


But 


1 

1+P 

1 

LSX  ^ 

y  pp‘” 

t  (1-M  1 

I  ‘■4*^ 

j 

i 

3 

_i  _ 

1+P 


j 


1  Pi(XPji)‘^'’ 


^  Pi(XPj,t!l-X)q.)‘^'’ 


1+P 

1  ~ 

+ 

^  Pi((l-X)qjii‘^'’ 

i 

1+P 


1+P 


=  R, 


where  th  -^  first  inequality  is  that  between  the  arithmetic  and  geometric  means,  and  the 
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•5 

i 

i 


I 


i 

i 

I 


second  is  Minkowski's  inequality,  which  is  valid  because  0  <  1/l+p  <  1.  But  if  L  <  R, 
-In  L  ^  -In  R,  so  the  lemma  is  proved. 


From  this  lemma  one  can  deduce  by  induction  that  E  {P,  p..)  (P,  p..),  where  the 

P  P  ji 

bar  indicates  an  average  over  any  ensemble  of  transition  probability  matrices.  The 
desired  theorem  follows. 

THEOREM:  If  e"^**  S  2  p..  h  K  <  then 

i  j#i 


E(r.p.p  ^E(r.p..). 


where 


~  ,  -NE{R)  ,,  . 

p..  =  1  -  e  '  ,  all  X 


rNE(R) 


Pji'  NR 


-  1 


j- 


‘~*NR 

PROOF:  Let  e  be  the  particular  assignment  P  in  which  P.  =  e  ,  all  i,  which 
because  of  its  symmetry  is  clearly  the  optimum  assignment  for  the  ideal  superchannel. 
Then 


E(r,  Pjj)  =  max  Ep(P,  p^^)  -  prNR 
P.P 


0  <  p  <  1. 


Suppose  we  permute  the  inputs  and  outputs  so  that  the  one-to-one  correspondence  between 

them  is  maintained,  thereby  getting  a  new  matrix  P',  for  which  evidently  Ep^e~^^,  p^^^  = 

Ep^e  Averaging  over  the  ensemble  of  all  such  permutations,  and  noting 

that 


^=1-K. 


Pa  = 


K 


all  i 


we  have 


Obviously,  Ep(e"^^.  Pj,J  «  Ep(e"^^,  p^  since  K  «  so  that  finally 


E{r,p..)  >  max  E  (e  -  prNR  =  E(r,p..). 

3  0<p<l  31/  Ji 
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In  Section  n  we  computed  the  error  exponent  for  this  case,  and  foimd  that 
Pr(e)  «  e'"^  (r,  R), 

where 

E*(r,  R)  =  (1-r)  min  {R.  E(R)}.  (100) 

To  get  the  tightest  bound  for  a  fixed  over-all  rate  R',  we  maximize  E  (r,  R)  subject 
to  the  constraint  rR  =  R*.  Let  us  define  Rg  to  be  the  R  satisfying  Rj,  =  E(Rg);  clearly, 
we  never  want  R  <  R^,,  so  that  E^(R’)  can  be  expressed  ■ 

E„(R')=  max  E{R)(l-r).  (101) 

rR=R' 

The  computational  results  of  Section  VI  suggest  that  the  r  and  R  maximizing  this 
expression  are  good  approximations  to  the  rates  that  are  best  used  in  the  concatenation 
of  BCH  codes. 

Geometrically,  we  can  visualize  how  E^(R')  is  related  to  E(R)  in  the  following  way 
(see  Fig.  9).  Consider  Eq.  100  in  terms  of  R*  for  a  fixed  R: 

E*(R')  =  (l  -^)  min  {R,  E(R)}. 

This  is  a  linear  function  of  R‘  which  equals  zero  at  R*  =  R  and  equals  min  {r,  E(R)}  at 
R'  =  0.  In  Fig.  9  we  have  sketched  this  function  for  R  =  R j ,  R^,  and  R^  greater  than 
Rg,  for  Rg,  and  for  R^  less  than  Rg.  E^(R')  may  be  visualized  as  the  upper  envelope 
of  all  these  functions. 

As  R'  goes  to  zero,  the  maximization  of  (101)  is  achieved  by  R  =  R^,  r  —  0,  so  that 
E^(0)  =  E(Rj,)  =  Rj,. 

Since  the  E(R)  curve  lies  between  the  two  straight  lines  Lj  =  E(0)  and  =  E(0)  -  R,  we 
have 

E(0)  5^E(Rg)  >E(0)  -  Rg 
or 

E(0)  >E(R^)  5^yE(0). 

The  efficiency  ■n(O)  =  E^(0)/E(0)  is  therefore  between  one-half  and  one  at  R'  =  0. 

As  R'  goes  to  the  capacity  C,  Ep,(R')  remains  greater  than  zero  for  all  R'  <  C,  but 

2 

the  efficiency  approaches  zero.  For,  let  E(R)  =  K(C-R)  near  capacity,  which  is  the 
normal  case  (and  is  not  essential  to  the  argument).  Let  R'  =  C(l-€),  f  >  0;  the  maxi¬ 
mum  of  (101)  occurs  at  R  =  C(l"2c/3),  where  E^(R)  =  4e^KcV27  >  0.  Hence  •n(R')  = 
4€/27,  so  that  the  efficiency  goes  to  zero  as  R*  goes  to  C.  The  efficiency  is  propor¬ 
tional  to  (l-R'/C),  however,  which  indicates  that  the  drop-off  is  not.  precipitous.  Most 
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Fig.  9-  Derivation  of  E^(R')  from  ECR’. 

importaint,  this  makes  the  coding  theorem  so  provocative  that  exponential  decrease  in 
Pr(e)  at  all  rates  below  capacity  is  preserved. 

We  know  from  the  previous  discussion  that  over  that  part  of  the  curved  segment  of 
E^(R')  for  which  r  >  1/2,  which  will  normally  be  [when  E(Rg)  is  on  the  straight-line  seg¬ 
ment  of  E(R)]  the  entire  curved  segment,  Reed-Solomon  codes  are  capable  of  achieving 
the  error  exponent  E^(R')  if  we  use  maximum- likelihood  decoding.  If  we  use  minimum- 
distance  decoding,  then  we  can  achieve  only 

-nNE  (R') 

Pr(e)  <  e  ^  , 

where 

E  (R')=  max  E{R)(l-r)/2. 

^  rR=R' 

Over  the  curved  segment  of  E,^(R),  therefore,  E  (R')  is  one-half  of  E_(R');  below 

V-'  m 

this  segment  E^(R')  will  be  greater  than  E^(R')/2,  and,  in  fact,  for  R'  =  0 
E  JO)  =  E(0)/2 

which  will  normally  equal  E^(0).  Thus  minimum-distance  decoding  costs  us  a  further 
lacLor  of  one-half  or  better  in  efficiency,  but,  given  the  large  sacrifice  in  efficiency 
already  made  in  going  to  concatenated  codes,  this  further  sacrifice  seems  a  small 
enough  price  to  pay  for  the  great  ‘Simplicity  of  minimum-distanco  decoding. 
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0.3 


Fig.  10.  E{R)  curve  for  original  channel. 


In  Fig.  10  we  plot  the  concatenated  exponent  E^(R'),  the  minimum-distance  expo¬ 
nent  E^(R'),  and  the  original  error  exponent  E(R')  of  a  binary  symmetric  channel  with 
crossover  probability  .  01.  The  efficiency  ranges  from  1/2  to  approximately  .  02  at  9/10 
of  capacity,  which  indicates  that  concatenated  codes  must  be  from  2  to  50  times  longer 
than  unconcatenated.  We  shall  find  that  these  efficiencies  are  roughly  those  obtained  in 
the  concatenation  of  BCH  codes. 

It  is  clear  that  in  going  to  a  great  number  of  stages,  the  error  exponent  approaches 
zero  everywhere,  as  v/e  would  expect. 

We  have  not  considered  the  expurgated  part  of  the  coding-theorem  bound  for  two 
reasons:  first,  we  are  usually  not  interested  in  concatenating  unless  we  want  to  signal 
at  high  rates,  for  which  complex  schemes  are  required,  second,  a  lemma  for  the  expur¬ 
gated  bound  similar  to  our  earlier  lemma  is  lacking,  so  that  we  are  not  sure  the  ideal 
superchannel  is  the  worst  of  all  possible  channels  for  this  range.  Assuming  such  a 
lemma,  we  then  find  nothing  essentially  new  in  this  range;  in  particular,  ■n(O)  remains 
equal  to  1/2, 

Finally,  let  us  suppose  that  the  inner  decoder  has  the  option  of  making  deletions. 

Since  all  deletions  are  equivalent,  we  lump  them  into  a  single  output,  so  that  now 

NR  NR 

the  superchannel  has  e  inputs  and  1  +  e  outputs.  Let  the  error  probability 

“NE  “ND 

for  the  superchannel  be  e  and  the  deletion  probability  e  ;  assuming  the  ideal 
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superchannel  with  deletions  again  the  worst,  we  have 


Pr(e)  « 

where 

E(r)  =  max  Ep{^)  -  pNRr 
P»P 


Thus  a  deletion  capability  cannot  improve  the  concatenation  exponent  E^(R'),  although 
it  can,  of  course,  bring  the  minimum-distance  exponent  E^(R')  closer  to  E^(R'),  and 
thereby  lessen  the  necessary  block  length  by  a  factor  less  than  two. 
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VI.  COMPUTATIONAL  PROGRAM 


The  theoretical  resxilts -that  we  have  obtained  are  suggestive;  however,  what  we 
really  want  to  know  is  how  best  to  design  a  communication  system  to  meet  a  specified 
standard  of  performance.  The  difficulty  of  establishing  meaningful  measures  of  com¬ 
plexity  forces  us  to  the  computational  program  described  here. 

6. 1  CODING  FOR  DISCRETE  MEMORYLESS  CHANNELS 

We  first  investigate  the  problem  of  coding  for  a  memoryless  channel  for  which  the 
modulation  and  demodulation  have  already  been  specified,  so  that  what  we  see  is  a  chan¬ 
nel  with  q  inputs,  q  outputs,  and  probability  of  error  p.  K  we  are  given- a  desired  over¬ 
all  rate  R*  and  over-all  probability  of  decoding  error  Pr{e),  we  set  ourselves  the  task 
of  constructing  a  list  of  different  coding  schemes  with  rate  R'  and  probability  of  decoding 
error  upperbounded  by  Pr(e). 

The  types  of  coding  schemes  which  we  contemplate  are  the  foUowihg.  We  could  use 
a  single  BCH  code  on  GF(q)with  errors -only  minimvun-distance  decoding.  Or,  we  could 
concatenate  an  RS  outer  code  in  any  convenient  field  with  an  inner  BCH  code.  In  the  latter 
case,  the  RS  decoder  could  be  set  for  errors-only  or  m^ified  deletions -and -errors 
decoding  (cf.  sec.  4.  6b);  we  do  not  consider  generalized  minimiun-distance  decoding, 
because  of  the  difficulty  of  getting  the  appropriate  probability  bounds.  If  the  outer  decoder 
is  set  for  errors-only  decoding,  the  inner  decoder  is  set  to  correct  as  many  errors  as  it 
can,  and  any  uhcorrected  word  is  treated  by  the  outer  decoder  as  an  error.  If  the  outer 
decoder  can  correct  deletions,  however,  the  inner  decoder  is  set  to  correct  only  up  to 
tj  errors,  where  t^  may  be  less  than  the  maximum  correctable  number  t^,  and  uncor¬ 
rected  words  are  treated  by  the  outer  decoder  as  deletions. 

Formulas  for  computing  the  various  probabilities  involved  are  derived-and  discussed 
in  Appendix  B.  In  general,  we  are  successful  in  finding  formulas  that  are  both  valid  upper 
bounds  and  good  approximations  to  the  exact  probabilities  required.  The  only  exception 
is  the  formula  for  computing  the  probability  of  undetected  error  in  the.  inner  decoder, 
when  the  inner  decoder  has -the  option  of  deletions,  where  the  lack  of  good  bounds  on  the 
distribution  of  weights  in  BCH  codes  causes  us  to  settle  for  a  valid  upper  bound,  but  not 
a  good  approximation; 

Within  this  class  of  possible  schemes,  we  restrict  our  attention  to  a  set  of  'good' 

codes.  Tables  1-6  are  representative  of  such  lists.  Tables  1-4  concern  a  binary  sym- 

-12 

metric  channel  with  p  =  .01;  the  specifications  considered  are  Pr(e)  =  10  for 

Tables  1-3,  Pr(e)  =  lO”^  for  Table  4,  R'  =  .  5  for  Table  1,  .  7  for  Tables  2  and  4,  and 

.  8  for  Table  3.  (For  this  channel  C  =  .  92  bits  and  R  =  .  74.)  Table  5  concerns  a 

comp 

binary  symmetric  channel  with  p  =  .  1  (so  that  C  =  .  53  and  =  •  32);  the  specifica¬ 

tions  are  R'  =  .  15  and  Pr(e)  =  lO”^.  Table, C  concerns  a  32-ary  channel  with  p  =  .  01  (so 
that  C  =  4.  86  and  =  4.  11);  the  specifications  are  R'  =  4,  and  Pr{e)  =  10~^^. 

Since  the  value  of  a  particular  scheme  depends  strongly  upon  details  of  implementation 
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T^le  1.  Codes  of  rate  .  5  that  achieve  Pr(e)  <  on  a  binary  symmetric  channel 
vrith  crossover  |xt>bability  p  =  .  01. 


(  H.K  ) 

D 

T 

(  n.k  ) 

d 

t 

Nn 

Comment 

(414.207) 

51 

25 

... 

414 

one  stage 

(  15.11  ) 

3 

1 

(76,52) 

25 

12 

1140 

e-o 

(  31.21  ) 

5 

2 

(69.51) 

19 

9 

2139 

e-o 

(  63.36  ) 

11 

5 

(48.42) 

7 

3 

3024 

’best*  e-o 

(  63.39  ) 

9 

4 

(52.42) 

11 

5 

3276 

e-o 

(  63,45  ) 

7 

3 

(54.38) 

17 

8 

3402 

e-o 

(127,71  ) 

19 

9 

(38.34) 

5 

2 

4826 

e-o 

(127,78  ) 

15 

7 

(33.27) 

7 

3 

4191 

e-o 

(127,85  ) 

13 

6 

(32,24) 

9 

4 

4064 

e-o 

(127,92  ) 

11 

5 

(46,32) 

15 

7 

5842 

e-o 

(127,95  ) 

9 

4 

(62,40) 

23 

11 

7874 

e-o 

(  31.20  ) 

6 

2 

(45.35) 

11 

5 

1364 

d&e 

(  31.21  ) 

5 

1 

(77,57) 

21 

4 

2387 

d&e 

(  63,36  ) 

11 

4 

(40.35) 

6 

2 

2520 

dfce 

(  63.36  ) 

11 

3 

(72.63) 

10 

1 

4536 

dke 

(  63.38  ) 

10 

4 

(41,34) 

8 

3 

2583 

d&e 

(  63.38  ) 

10 

3 

(47.39) 

9 

2 

4536 

d&e 

(  63.39  ) 

9 

3 

(42.34) 

9 

4 

2646 

d&e 

Notes  — Tables  1>6 


N(n)  =  length  of  inner  (outer'  code 
K(k)  =  number  of  information  digits 

D(d)  =  minimum  distance  (d- 1  is  the  number  of  deletions  corrected) 

T(t)  =  maximum  number  of  errors  corrected 
nN  =  over-all  block  len^h 

jC^mment:  e-o  =  errors-only,  d&e  =  deletions-and-errors  decoding  in  the 
outer  decoder. 
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Table  2.  Codes  rate  .  7  that  achieve  PrCe)  <10  oo  a  binary  symmetric  channel 
vith  crossover  janobability  P  =  •  01- 


(  n.k  ) 

D 

T 

(  n.k  ) 

d 

t 

nX 

Comment 

(2740. 1913) 

143 

71 

— 

2740 

one  stage 

{  127.99  ) 

9 

4 

(  530.476  ) 

55 

27 

67310 

e-o 

(  255.207  ) 

13 

6 

(  465,401  ) 

65 

32 

J 18575 

e-o 

{  255, 199  ) 

15 

7 

(  292,262  ) 

31 

15 

74460 

e-o 

(  255. 191  ) 

17 

3 

(  306,286  ) 

21 

10 

73030 

e-o 

{  255,187  ) 

19 

9 

(  303.294  1 

15 

7 

73540 

*liest''  e-o 

{  127,98  ) 

10 

4 

(  324,294  ) 

31 

12 

4114S 

d&e 

(  127.92  ) 

11 

4 

(1277, 1234) 

43 

5 

162179 

d&e 

(  127.91  ) 

12 

5 

(1034. 1059) 

25 

10 

137663 

dfce 

(  255, 199  ) 

15  . 

6 

(  214. 192  ) 

23 

4 

54570 

d&e 

(  255, 193  ) 

16 

6 

(  234,211  ) 

24 

3 

59670 

d&e 

(  255, 19£  ) 

16 

7 

(  214. 193  ) 

22 

9 

54570 

dfce 

(  255, 191  ) 

17 

7 

(  214.2C0  ) 

15 

3 

54570 

d&e 

(  255, 190  ) 

18 

7 

(  232,213  ) 

15 

3 

59160 

d&e 

(  255, 190  ) 

18 

8 

(  232.218  ) 

15 

7 

59160 

d&e 

{  255, 187  ) 

19 

8 

(  198, 189  ) 

10 

3 

50490 

d&e 

(  255, 186  ) 

20 

8 

(  224,215  ) 

10 

2 

57120 

d&e 
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Table  3.  Codes  of  rate  .  8  that  achieve  Pr(e)  <10  on  a  binary  symmetric  channel 
with  crossover  probability  p  =  .  01. 


(  N.K  ) 

D 

T 

(  n,k  ) 

d 

t 

nN 

Comment 

no  single-stage  code 

(2047, 1695) 

67 

33 

(1949, 1883) 

67 

33 

3989603 

e-o 

(2047, 1684) 

69 

34 

(1670, 1624) 

47 

23 

3418490 

'best’  e-o 

(2047, 1673) 

7i 

35 

(1702, 1666) 

37 

18 

3483994 

e-o 

(2047,  i662) 

73 

36 

(2044,2014) 

31 

15 

4184068 

e-o 

(2047, 1695) 

67 

31 

(1477, 1427) 

51 

3 

3023419 

d&e 

(2047, 1695) 

67 

32 

(  866,856  ) 

31 

6 

1813642 

d&e 

(2047, 1684) 

69 

32 

(1234, 1200) 

35 

3 

2525998 

d&e 

(2047, 1684) 

69 

33 

(  763,742  ) 

22 

5 

1561861 

d&e 

(2047, 1673) 

71 

34 

(  804,787  ) 

18 

5 

1645788 

d&e 

77 


-6 

Ta]]te  4.  Cedes  oC  rate  .  7  that  adoeve  Fr(e)  <10  oc  a  binary  symmetric  channel 
artth  crossover  pcobalnli!^  p  =.  01. 


(  k,k  1 

D 

T 

(  n.k  ) 

d 

t 

nN 

Comment 

{784,5491 

49 

24 

— 

734 

one  sta^ 

1127.99  1 

9 

4 

(236.212) 

25 

12 

29972 

e-o 

(127,93  1 

11 

5 

(475.459) 

17 

3 

60325 

e-o 

(Z55.Z07I 

13 

6 

(204.176) 

29 

14 

52020 

e-o 

(255. 1991 

15 

7 

(136.122) 

15 

7 

346S0 

e-o 

(255,1911 

17 

8 

(123,115) 

9 

4 

313^ 

•best*  e-o 

(255, 1871 

19 

9 

(132. 126) 

7 

3 

33660 

e-o 

(127.98  1 

10 

4 

(564.545) 

20 

2 

71623 

d&e 

(127,92  1 

11 

4 

(140.127) 

14 

5 

17730 

dte 

(127.91  1 

12 

5 

(477,466) 

12 

4 

60579 

d&e 

(255.2061 

14 

6 

(128.111) 

13 

3 

32660 

d&e 

(255, 199) 

15 

6 

(  93.33  ) 

11 

2 

24990 

d&e 

(255, 198) 

16 

6 

(102.92  ) 

11 

1 

2601C 

d&e 

(255.198) 

16 

7 

(  92.33  ) 

10 

4 

23460 

d&e 

(255.191) 

17 

7 

(  92.36  ) 

7 

1 

23460 

dice 

(255.190) 

18 

7 

(100.94  ) 

7 

1 

25500 

dte 

(255,190) 

18 

8 

(100.94  ) 

7 

3 

25500 

d&c 

(255, 187) 

19 

S 

(  83.34  ) 

5 

1 

22440 

d&e 

(255. 186) 

20 

3 

(100.96  ) 

5 

1 

25500 

d&e 

Table  5. 

Codes  of  3^e  .  15  that  achieve  ErCe)  <  lO”^ 
-with  cios^ver  probability  p  =  .  1. 

on  a 

binarr  symmetric  channel 

(  N.K) 

D 

T 

(  ) 

d 

t 

hN 

Comment 

(511.76) 

171 

85 

— 

511 

one  stage 

(  31.11) 

11 

5 

(  59.2 

35 

17 

1829 

e>-o 

(  31.6  ) 

15 

7 

(  54,42) 

13 

6 

1674 

e-o 

(  63,18) 

21 

10 

(  51.27) 

25 

12 

3213 

e-o 

(  63,16) 

23 

11 

(  35.21) 

15 

7 

2205 

e-o 

(  31,11) 

11 

4 

(  40. 17) 

24 

5 

1240 

d&e 

(  31,10) 

12 

4 

(  43,20) 

24 

4 

1333 

d&e 

(  31,10) 

12 

5 

(  47,22) 

26 

10 

1457 

d&e 

(  31,6  ) 

15 

5 

(116, 90) 

27 

2 

3596 

d&e 

(  31,6  ) 

6 

(  45,35) 

11 

3 

1395 

d&e 

78 


Table  6u  Cbdes  off  rate  4  tbat  adBere  Prfel  <10  on  a  3Z»xii|iat  sjrmioefric  cnannel 

arxtli  probabili^  off  error  p  =  -  01. 

(  K.K  1 

D 

T 

1  >».Sc  ) 

d 

t 

viSS 

Comment 

{540.43Z1 

57 

28 

— 

540 

ooesta«e 

(  51.27  1 

5 

2 

(  393.361  ) 

33 

16 

12183 

e>o  (both  cades  RS) 

f  31,Z5  ) 

7 

3 

(3250,3224) 

27 

13 

100750 

e-o 

{14S,1Z5} 

13 

6 

(  341.323  ) 

19 

9 

50468 

e*o 

C148.  IZl) 

15 

7 

(  652.638  ) 

15 

7 

96496 

e>o 

(ZZ3,196) 

15 

7 

(  245.223  ) 

23 

11 

54635 

e-o 

(ZZ3.19Z) 

17 

8 

(  193.184  ) 

15 

7 

44154 

e-o 

(ZZ3. 188) 

19 

9 

(  196.186  ) 

11 

5 

43708 

e-o 

(Z9S.Z67) 

17 

8 

(  243.217  ) 

27 

13 

72414 

e>o 

(Z98.Z63) 

19 

9 

(  172. 156  ) 

17 

8 

51256 

e>o 

(Z98,Z59) 

21 

10 

(  151. 139  ) 

13 

6 

44998 

e-o 

(Z98,Z55) 

23 

11 

(  123,115  ) 

9 

4 

36654 

e>o 

(Z98,Z51) 

25 

12 

(  120, 114  ) 

7 

3 

35760 

e>o 

(  31.Z6  ) 

6 

2 

(  434,414  ) 

21 

7 

13454 

d4e 

(148, 1Z5) 

13 

5 

(  266,252  ) 

15 

2 

39368 

dfte 

(148, 1Z3) 

14 

6 

{  375,361  ) 

15 

6 

55500 

d&e 

(148,  IZl) 

15 

6 

(  466.456  ) 

11 

2 

68968 

d&e 

(ZZ3, 196) 

15 

6 

(  168.153  ) 

16 

2 

37464 

d&e 

(ZZ3, 192) 

17 

7 

(  128,119  ) 

10 

2 

28544 

d&e 

(298,263) 

19 

8 

(  107,97  ) 

11 

2 

31886 

d&e 

(298,259) 

21 

9 

(  89,82  ) 

8 

2 

26522 

d&e 
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Single-Stage  Two-Stage 
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asud  the  requirements  of  z.  particular  system,  we  cannot  say  that  a  particular  entiy  on 
any  of  these  lists  is  *best.*  If  gnmitrsK??*  over -all  block  lei^th  is  the  overriding  criterion, 
then  a  sh^la  stage  €&  codii^  is  the  best  solutic:.';  however,  we  see  that  usii^  only  a 
single  stage  to  achieve  certain  specifications  may  require  the  correction  of  a  great  num¬ 
ber  of  errors,  so  that  almost  certainly  at  some  point  the  number  cd  decoding  confu¬ 
tations  becomes  prohibitive.  Then  the  savings  in  number  of  computations  which 
cacicatenation  affords  may  be  quite  striking. 

Amoeg  the  concatenated  codes  with  errors-ohly  decoding  in  the  outer  decoder,  the 
■best*  code  is  not  too  digxcalt  to  identic  anproEzimatefy,  since  the  codes  that  correct  the 

fewest  errors  over  aM  tend  also  to  be  those  with  comparatively  short  block  lengths. 

_ 12  — ^ 

Tables  7  and  8  display  such  ■best*  codes  for  a  range  of  rates  and  Pr(e)  =  10  and  10  , 

on  a  BSC  with  p  =  .  01;  the  best  single-stage  codes  are  also  shown  for  comparison. 

a.  Discussion 

From  these  tables  we  mz^  draw  a  number  of  conclusions,  which  we  shall  now 
discuss. 

From  Tables  1-6  we  can  evaluate  the  effects  of  usii^  deletions-and-errors  rather 
than  errors-only  cecodhig  in  the  outer  decoder.  These  are 
!.  n^ligible  eSect  on  the  inner  code; 

2.  reduction  of  the  length  of  the  cater  code  and  hence  the  over -all  block  lei^ih  by  a 
factor  less  than  two;  and 

3.  appreciable  savings  in  the  nurrber  of  computations  required  in  the  outer  decoder. 
From  cozeparison  of  Tables  2  and  4  and  of  7  and  8  we  find  that  the  ejects  of  squaring 

the  required  probability  of  error,  at  moderately  hi^  rates,  are 

1.  n^ligible  enect  on  the  inner  code;  and 

2.  increase  of  the  length  of  the  outer  code  and  hence  the  over -all  block  lengfc  by  a 
factor  greater  than  two- 

We  concliaie  that,  at  the  moderately  high  rates  where  concatenation  is  most  useful, 
the  complexity  of  the  inner  code  is  affected  only  by  the  rate  required,  for  a  given 
chaimeL 

These  conclusions  may  be  understood  in  the  light  of  the  following  considerations. 
Observe  the  columns  in  Tables  7  and  8  v/hich  tabulate  the  probability  of  decodirg  error 
for  the  inner  decoder,  which  is  the  probability  of  error  in  the  superchannel  seen  by  the 
outer  decoder-  This  probability  remains  within  a  narrow  range,  approximatd.y  10  ■^- 

—A. 

10  ",  largely  independent  of  the  rate  or  over -all  probability  of  error  required.  It  seems 
that  the  only  function  of  the  inner  code  is  to  bring  the  probability  of  error  to  this  level, 
at  a  rate  slightly  above  the  over-all  rate  required. 

Thus  the  only  relevant  question  for  the  design  of  the  inner  coder  is:  How  long  a  block 

_3 

length  is  required  to  bring  the  probability  of  decoding  error  down  to  10  or  so,  at  a  rate 
somewhat  in  excess  of  the  desired  rate?  If  the  outer  decoder  can  handle  deletions,  then 
we  substitute  the  probability  of  decoding  failure  for  that  of  decoding  eiror  in  this 
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question,  bat  without  greatly  sifectmg  the  answer,  since  gettii^  suSic.ieiit  minimum  dis¬ 
tance  at  the  desired  rate  is  the  crux  of  the  problem. 

Once  the  inner  code  has  achieved  ihig  moderate  probability  of  error,  the  function  of 
the  outer  code  is  to  drive  the  over-all  probabiii^  of  error  down  to  the  desired  value,  at 
a  dimensionless  rate  near  one. 

The  arguments  of  section  5.4  are  a  useful  guide  to  understandii^  these  results. 
Recall  that  when  the  probabili^  of  error  in  the  stmerchannel  was  small,  the  over -all 
probabiii^  of  error  was  bounded  by  an  egression  of  the  form 

-hSS,{R') 

Pr{e)  e  *  . 

Orffie  we  have  made  the  stmerchannel  prt^jabiiily  of  error  'small*  (apparently  ~10  ^),  we 
then  achieve  the  desired  over-all  probabiliify  of  error  by  increasii^  n.  To  square  the 
Pr(e),  we  would  expect  to  have  to  double  n.  Actually,  n  increases  by  more  than  a  factor 
of  two,  which  is  due  to  our  keeping  the  inner  and  outer  decoders  of  comparable 
complexify. 

That  the  length  of  the  otiler  code  decreases  by  somev/hat  less  than  a  factor  of  two 
when  deletions-and-errors  decoding  is  pezmitted  is  entirely  in  accord  with  the  results 
of  section  5. 4.  Basically,  the  reason  is  that  to  correct  a  certain  number  of  dd.etions 
requires  one-half  the  number  of  check  digits  in  the  outer  cede  as  to  correct  the  same 
number  of  errors,  so  that  for  a  fixed  rate  and  equal  probabilities  of  d^etion  or  error, 
the  deletion  corrector  will  be  approximately  half  as  long. 

Finally,  we  observe  that,  surprisingly,  the  ratios  of  the  over-all  length  of  a  con¬ 
catenated  code  of  a  given  rate  to  that  of  a  single-stage  code  of  the  same  rate  are  given 
qualitatively  by  the  efficiencies  computed  in  section  5. 4  —  surprisingly,  since  the  bounds 
of  that  section  were  derived  ly  random-coding  arguments  whereas  here  we  consider 
BCH  codes,  and  since  those  boimds  are  probably  not  tight.  The  dimensionless  rate  of 
the  outer  code  also  agrees  approximately  with  that  specified  in  section  5. 4  as  optimiun 
for  a  given  over -all  rate. 

In  summary,  the  considerations  of  section  5. 4  seem  to  be  adequate  for  qualitative 
imderstanding  of  the  performance  of  concatenated  codes  on  discrete  memoryless  chan¬ 
nels. 

6.  2  CODING  FOR  A  GAUSSL4N  CHANNEL 

We  shall  now  take  up  the  problem  of  coding  for  a  white  additive  Gaussian  noise  chan¬ 
nel  with  no  bandwidth  restrictions,  as  an  example  of  a  situation  in  which  we  have  some 
freedom  in  choosing  how  to  modulate  the  channel. 

One  feasible  and  near-optimiun  modulation  scheme  is  to  send  one  of  M  a  2  °  bior- 
thogonal  waveforms  every  T  seconds  over  the  channel.  (Two  waveforms  are  orthogonal 
if  their  crosscorrelation  is  zero;  a  set  of  waveforms  is  biorthogonal  if  it  consists  of 
M/2  ojrthogonal  waveforms  and  their  negative.*’.)  If  every  waveform  has  energy  S,  and 
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the  Gaussian  noise  has  taro-sided  spectral  density  N  JZ,  then  we  say  the  power  signal- 
to-noise  ratio  is  S/N^T.  Since  the  information  in  aiqr  transmission  is  bits,  the  infor¬ 
mation  rate  is  R^T  bits  per  second;  finally,  we  have  the  fact  that  the  dimensionless 
quantity  signal-to-noise  ratio  per  information  bit  is  S/(!«^R^). 

S/(S«^R  )  is  commonly  taken  as  the  criterion  of  efficiency  for  signaling  over  unlimited 
bandaridth  white  Gaussian  noise  channels.  Coding  theorem  argoments^^  show  that  for 
reliable  communication  it  must  exceed  In  2  ~  .^7.  Our  objective  will  be  to  achieve  a 
given  over-all  probability  of  error  for  fired  S/(55^R^),  with  minimum  complexity  of 
instrumentation. 

39 

The  general  optimal  method  of  demodulating  and  detecting  such  waveforms  is  to 
set  iq>  a  bank  of  M/2  matched  filters.  For  example,  the  signals  might  be  orthogonal 
sinusoids,  and  the  filters  narrcw-ban^ass  filters.  In  some  sense,  the  complexity  of 
the  receiver  is  therefore  pr<^rtional  to  the  number  of  matched  filters  that  are 
required  —  that  is,  to  M.  The  bandwidth  occiq)ied  is  also  proportioziai  to  kt. 

Another  method  of  generating  a  set  of  biorthogonal  waveforms,  especially  int^esting 
for  its  relevance  to  the  question  of  &e  distinctiaa  between  modulation  and  coding,  is  to 
break  the  T-second  interval  into  (2T/M)-sec  subintervals,  in  each  of  which  either  the 
positive  or  the  negaiive  of  a  single  basic  waveform  is  transmitted.  If  we  make  the  cor¬ 
respondences  (positive-^— ►  1}  and  (negative-*— ^  0),  we  can  let  the  M  sequences  be  the 
code  words  of  the  (M/2,  R^)  binary  code  that  results  from  adding  an  over-all  parity 
check  to  an  (M/2~l ,  R^)  BCH  code;  it  can  then  be  shown  that  the  M  waveforms  so  gen¬ 
erated  are  biorthogonaL  If  they  are  detected  by  matched  filters,  then  we  would  say  that 
we  were  dealing  with  an  M-ary  modulation  scheme.  On  the  other  hand,  this  (M/2,  R^) 
code  can  be  shown  to  have  minimum  distance  M/4,  and  is  thus  suitable  for  a  decoding 
scheme  in  which  a  hard  decision  on  the  polarity  of  each  (2T/M)-sec  pulse  is  followed  by 
a  minimum -distance  decoder.  In  this  last  case  we  would  say  that  we  were  dealing  with 
binazy  modulation  with  coding,  rather  than  M-airy  modulation  as  before,  though  the  trans¬ 
mitted  signals  were  identical.  The  same  sequences  could  be  decoded  (or  detected)  by 
many  methods  intermediate  between  these  extremes,  so  finely  graded  that  to  distinguish 
where  modulation  ends  and  coding  begins  could  only  be  an  academic  exercise. 

We  use  maximiun-likelihood  decoding  for  the  biorthogonal  waveforms;  the  corre¬ 
sponding  decision  rule  for  a  matched  filter  detector  is  to  choose  the  waveform  corre¬ 
sponding  to  the  matched  filter  whose  output  at  the  appropriate  saunple  time  is  the  greatest 
in  magnitude,  with  the  sign  of  that  output.  Approximations  to  the  probability  of  incorrect 
decision  with  this  rule  are  discussed  in  Appendix  B.  In  some  cases,  we  permit  the 
detector  not  to  make  a  decision  —  that  is,  to  signal  a  deletion  —  when  there  is  no  matched 
filter  output  having  magnitude  greater  by  a  threshold  D  or  more  than  all  other  outputs; 
in  Appendix  B  we  also  discuss  the  probabilities  of  deletion  and  of  incorrect  decision  in 
this  case. 

We  consider  the  following  possibilities  of  concatenating  coding  with  M-ary  modulation 
to  achieve  a  specified  probability  of  error  and  signal-to-noise  ratio  per  information  bit. 
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First,  ve  consider  modulation  alone,  vith  chosen  large  enough  so  the  specifications 

are  satisfied.  Next,  we  consider  a  single  stage  of  coding,  with  a  number  of  values  of 

R  .  and  with  both  errors-onlT  or  deletions -and-errors  decoding.  (If  r  is  the  dimension- 
o 

less  rate  oi  the  code,  the  signal-to-noise  ratio  per  information  bit  is  now  S/(N^R^r).) 
Finally,  we  coiiisider  two  stages  of  coding,  or  really  three-stage  concatenation. 

Tables  9-11  are  representative  of  the  lists  that  were  obtained.  Table  9  gives  the 
results  for  S/(N^R^r)  =  5.  Pr(e)  =  Table  10  for  S/(N^R^r)  =  2.  Pr(e)  =  and 

Table  11  for  S/(N^R^r)  =  2.  Pr(e)  =  lo”^.  Again,  one  cannot  pick  unambiguously  the 
*best‘  scheme;  however,  -the  schemes  in  which  M  is  large  enou^  so  that  a  single  Reed- 
Solomon  code  of  lei^^  less  than  M  can  meet  the  required  specifications  would  seem  to  be 
very  mwii  the  simplest,  unless  some  considerations  o&er  than  those  that  we  have  con¬ 
templated  heretofore  were  significanL 

To  organize  our  information  about  these  codes,  we  choose  to  ask  the  question:  For 

a  fixed  M  and  specified  Pr(e).  vdiich  RS  code  of  length  M-1  requires  the  minimum  signal- 

to-noise  ratio  per  information  bit?  Tables  12-15  answer  this  question  for  R  <  9  (after 

— ^  —6  — ‘Q  —12  ^ 

vdiich  the  computer  overflowed),  and  for  Pr(e)  =10  .10  ,  10  10  .  £xcq;>t  in 

T^le  15.  we  have  considered  only  errors -only  decoding,  sii^e  Table  15  shows  that,  even 
—12 

for  Pr(e)  =  10  .  allowing  deletions -and-errors  decoding  improves  things  very  little, 

to  the  accuracy  of  our  bounds,  and  does  not  aRect  the  character  of  the  results.  The 
S/(N^R^)  heeded  to  achieve  the  required  probability  of  error  without  coding,  for  20, 
is  also  indicated. 

a.  Discussion 

Let  us  first  turn  out  attention  to  Table  9,  which  has  the  richest  selection  of  diverse 
schemes,  as  well  as  being  entirely  representative  of  aU  of  the  lists  that  we  generated. 
Certyin  similarities  to  the  lists  for  discrete  memoryless  channds  are  immediately  evi¬ 
dent.  For  instance,  the  use  of  deletions  allows  some  shortening  and  simplification  of 
the  outer  decoder,  though  not  as  much  as  before.  Also,  for  fixed  M,  going  to  two  stages 
of  coding  rather  than  one  lessens  the  computational  demands  on  the  decoders,  at  the 
price  of  much  increased  block  length. 

It  seems  clear  that  it  is  more  efficient  to  let  M  become  large  enough  so  that  two 
stages  of  coding  are  unnecessary,  and  in  fact  large  enough  that  a  single  RS  code  can  be 
used.  As  M  falls  below  this  size,  the  needed  complexity  of  the  codes  would  seem  to 
increase  much  more  rapidly  than  that  of  the  modulation. deer  eases,  while  for  larger  M 
the  reverse  is  true.  The  explanation  is  that  a  certain  M  is  required  to  drive  the  proba¬ 
bility  of  detection  error  down  to  the  point  where  coding  techniques  become  powerful,  for 
S/(N^Rq)  somewhat  less  than  the  final  signal-to-noise  ratio  per  information  bit.  Once 
this  moderate  probability  has  been  achieved,  it  would  seem  to  be  wasteful  to  use  modu¬ 
lation  techniques  to  drive  it  much  lower  by  increasing  M.  Tables  10  and  11  illustrate  this 
point  by  showing  that  this  critical  M  is  not  greatly  affected  by  an  enormous  change  in 
required  Pr(e). 
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Table  9.  Modulation  and  codinf  that  achieve  Pr(e)  <  10~^^  with  a  sicnal-to-noise  ratio 
per  information  bit  of  5,  cn  a  Gaussian  channeL 


M 

(  N.K  ) 

D 

T 

(  xi.k  ) 

d 

t 

kKR 

O 

d/b 

Comment 

16334 

... 

14 

571.4 

DC  coding 

64 

(  21.15  ) 

7 

3 

90 

7.47 

e-o 

64 

{  20.12  } 

9 

4 

72 

8. 89 

e-o 

32 

(  26.18  ) 

9 

4 

90 

4.62 

e-6 

32 

(  26.16  ) 

11 

5 

80 

5.20 

e-o 

16 

(155. 136) 

11 

5 

544 

2.28 

e-o 

16 

(  90.67  ) 

13 

6 

- 

268 

2.69 

e-o 

16 

(  85.58  ) 

15 

7 

232 

2.93 

e-o 

16 

(  80.50  } 

17 

3 

200 

3. 20 

e-o 

16 

(  75.43  ) 

19 

9 

172 

3.49 

e-o 

8 

(236. 184} 

21 

10 

552 

1.  71 

e-o 

8 

(201. 138) 

25 

12 

414 

1.94 

e-o 

8 

(197. 124) 

29 

14 

372 

2. 12 

e-o 

2 

(511.358) 

37 

18 

358 

1.43 

e-o 

2 

(481.310) 

41 

20 

310 

1.55 

e-o 

2 

(461.254) 

51 

25 

254 

1.81 

e-o 

64 

(  43.37  ) 

7, 

1 

222 

6.20 

dlie 

64 

(41.33  ) 

9 

1 

198 

6.63 

d4c 

64 

(  26.22  ) 

5 

2 

132 

6.30 

dfce 

64 

(  19. 13  ) 

7 

2 

78 

7.79 

dfce 

64 

(  22. 14  ) 

9 

2 

84 

8.  38 

dlte 

64 

(  18, 12  ) 

7 

3 

72 

8.00 

d&e 

32 

(  29.23  ) 

7 

2 

115 

4.03 

d<ie 

32 

(  30.22  ) 

9 

2 

110 

'  4.36 

dfce 

32 

(  25, 19  ) 

7 

3 

95 

4.21 

dfce 

32 

(  22, 14  ) 

9 

3 

70 

5.  03 

d&e 

16 

(127, 108) 

11 

3 

422 

2,  35 

d&e 

16 

(117,94  ) 

i3 

3 

376 

2. 49 

dte 

16 

(  81,62  ) 

11 

,4 

248 

2.61 

dfce 

16 

{  79,56  ) 

13 

4 

224 

2.  82 

d&e 

16 

(  73,50  ) 

13 

6 

200 

2.  92 

d&e 

16 

(  15,11  ) 

5 

2 

(25,21} 

5 

2 

924 

3.  25 

s-o 

8 

{  43,36  ) 

5 

2 

(77,69) 

9 

4 

7452 

1.78 

e-o 

8 

(  48,37  ) 

7 

3 

(48.42) 

7 

3 

4662 

1.98 

e-o 

8 

(  63.49  ) 

9 

4 

(31.27) 

5 

2 

3969 

1.97 

e-o 

2 

(  63,45  ) 

7 

3 

(92.  30) 

13 

6 

3600 

1.  61 

e-o 

2 

(  63,39  ) 

9 

4 

(92.82) 

11 

5 

3198 

1.  81 

e-o 

2 

(  63,36  ) 

11 

5 

(63,55) 

9 

4 

1980 

2.  00 

e-o 

Notes:  Tables  9-11. 

N,  K,  D,  T,  n,  k,  d,  t  have  been  defined  in  Section  I 
M  =  number  of  biorthogonal  signals  transmitted 

kKR  =  total  bits  of  information  in  a  block 
o 

d/b  =  dimensions  required  (nNM/(2kKRQ))  per  information  bit. 
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Table  10.  Modulation  coding  that  achieve  Pi‘(e}  <  10  with  a  signal-to-noise  ratio 
per  information  bit  of  2,  on  a  Gaussian  channeL 


(  N.K  ) 

D 

T 

(  n,k  ) 

d 

t 

Comment 

512 

(211,167) 

45 

22 

«  «  « 

e-o 

512 

(261, 209) 

43 

21 

•  »  * 

e-o 

512 

(311,271) 

41 

20 

e-o 

256 

(255, 195) 

61 

30 

•  »  » 

e-o 

128 

(127,97  ) 

31 

15 

(127,119) 

9 

4 

e-o 

128 

(127,  99  ) 

29 

14 

(127,117) 

11 

5 

e-o 

128 

(127, 101) 

27 

13 

(127, 124) 

4 

0 

d&e 

128 

(127, 104) 

24 

11 

(127, 122): 

6 

0 

d&e 

128 

(127, 104) 

24 

10 

(127, 120) 

8 

0 

d&e 

Note:  The  special  RS  bound  on  weights  in  section  3. 3a  has  been  used  to  compute  prob¬ 
abilities  for  the  last  three  codes.  With  the  general  bound  of  Appendix  B,  it 
appears  that  deletions  are  no  help. 


—3 

Table  11.  Modulation:and  coding  that  achieve  Pr(e}  ^10  with  a  signal-to-noise  ratio 
per  information  bit  of  2,  on  a  Gaussian  channeL 


M 

(  N,K  ) 

D 

T 

(  n,k  ) 

d 

t 

Comment 

16384 

•  •  « 

♦  •  * 

no  coding 

256 

(  37,27  ) 

11 

5 

e-o 

256 

(  45,37  ) 

9 

4 

e-o 

128 

(  48,34  ) 

15 

7 

•  •  ♦ 

e-o 

128 

(  50,38  ) 

13 

6 

«  «  * 

e-o 

64 

(895, 719) 

91 

45 

«  •  « 

e-o 

Note:  Deletions  are  no  help. 


Tables  12-15.  Minimum  S/(N^R^r)  achievable  on  a  Gaussian  chahneL 


Table 

Pr(e)  = 

12. 

lo-l 

Table  13 

Pr(e)  =  10 

r6 

Table  14. 

Pr(e)  =  l6"’ 

«o 

no  code 

RS  code 

t 

no  code 

RS  code 

t 

no  code 

RS  code 

t 

1 

4.78 

11.30 

17.98 

2 

5.42 

11.96 

18.  66 

3 

4.2S 

4. 23 

1 

8.  68 

7,34 

1 

13.  16 

10.  42 

I 

4 

3.  57 

3.  11 

3 

6.92 

4.59 

3 

10,  28 

6.01 

3 

5 

3.  12 

2.41 

5 

5.  83 

3. 19 

5 

8.  52 

3.  88 

6 

6 

2.81 

2.  02 

9 

5.  09 

2.44 

10 

7.34 

2.  80 

11 

7 

2.  59 

1.77 

18 

4.-56 

2.01 

19 

6.49 

2,21 

19 

8 

2.41 

1.6i 

33 

4,  16 

1.76 

34 

5.  85 

1.88 

35 

9 

2,28 

1.50 

62 

3.  85 

1.60 

64 

5.  35 

1.  67 

65 

10 

2.  16 

3,  60 

4.  95 

11* 

2.  i8 

3.40 

4.  63 

12* 

2.  11 

3.  23 

4,  35 

. 

14 

2.  00 

2.96 

3.  93 

16* 

1.  92 

2,  76 

3.  61 

• 

18 

1,85 

2.  60 

3.36 

* 

- 

.20 

h  80 

2.48 

.3,  16 

Table  15.  Pr(e)  =  10 


R 

no  code 

RS  code 

t 

?£ 

RS  code  (d&e) 

1 

24.  74\ 

2 

25.42 

3 

17.  67 

13,53 

1 

. 0000002 

13,60 

4 

13.  67 

7. 45 

3 

.  ooo'i 

Jb.  86 

5 

11.23 

4.  54 

6 

.002 

4.  25 

6 

9.60 

3,  13 

11 

.009 

3.  02 

7 

8.43 

2.  40 

20 

.02 

2.38 

8 

7.  55 

1,  98 

36 

.036 

9 

6.  86 

1.  73 

67 

.05 

10  6.31 

n*  5. 86 

12*  5,49 

14*  4. 90 

16*  4.46 

18*  4.11 

20*  3. 84 


Notes:  Tables  12-15. 

R^  =  log2M 

no  code  =  minimum  signal-to-noise  ratio  per  Information  bit  achievable  without  coding 
RS  code  =  minimum  signal-to-noise  ratio  per  information  bit  achievable  with  an  RS  code 
of  length  M  -  1 

t  =  number  of  errors  which  the  RS  code  must  correct 

RS  code  (d&e)  =  minimum  signal-to-noise  ratio  per  information  bit  achievable  by  an 
RS  code  correcting  t  errors  and  2t  deletions. 

For  these  values  of  R^  a  weaker  probability  bound  was  used  (see  Appendix  B). 
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Since  the  BS  codes  are  tiie  most  eCficient  oC  the  BCH  class  -with  respect  to  the  num¬ 
ber  of  deck  d%its  required  to  achieve  a  certaun  minimum  distance  and  hence  error- 
correction  capability,  another  important  effect  of  increasiqg  M  is  to  make  the  symbol 
fidd  GF(M)  large  enoo^  that  BS  codes  of  the  necessary  block  lengths  can  be  realized. 
Once  M  is  large  enoq^  to  do  this,  further  increases  result  in  no  further  increase  of 
efficienqr  in  this  reject. 

Tables  lZ-15  are  presented  as  much  for  reference  as  for  a  source  of  further  insi^it. 

K  is  interestiqg  to  note  that  for  a  given  M.  the  same  BS  code  is  spproadmatdy  optimum 

over  a  wide  raqge  of  required  Pr(e).  No  satisfactory  explanaticm  for  this  constancy  has 

been  obtained;  lest  the  reader  conjecture  that  there  might  be  some  universal  optimality 

to  these  codes,  however,  it  mi^it  be  mentioned  that  the  same  tables  for  a  different  type 

of  probability  distrS>ution  than  the  Gaussian  show  markedly  different  codes  as  optimum. 

Table  15  includes  the  superchannel  probabilities  of  error  seen  by  the  outer  coder;  thqr 

are  somewhat  hi^er  than  the  comparable  probabilities  for  the  discrete  memoryless 
—2  —3 

channel,  10  -10  .  but  remain  in  the  same  approximate  range. 

6.3  SUlfHABY 

A  most  interesting  conclusion  emerges  fromdese  calculations.  A  distinct  division 
of  function  between  the  outer  code  and  the  inner  stages  —  of  modulation,  or  inner-  coding, 
or  perhaps  both  — is  quite  apparent.  The  task  of  the  inner  stages,  while  somewhat 
^ceeding  the  specified  rate  or  S/(N  R  ),  is  to  turn  the  raw  chaimd  into  a  superdhannel 

-rif  ®  ® 

with  moderate  (lO*"  -10  )  probability  of  error,  and  enough  iiq>uts  so  that  an  RS  code 
may  be  used  as  the  outer  code.  The  function,  of  the  outer  code  is  then  to  drive  the  over-  . 
all  probability  of  error  as  low  as  desired,  at  a  dimensionless  rate  close  enough  to  one 
not  to  hxirt  the  over-all  rate  or  S/(N^R^)  badly. 

For  future  work,  two  separate  problems  of  design  are  suggested.  The  first  is  the 
most  efficient  realization  of  RS  encoders  and  decoders,  with  which  we  were  concerned 
in  Section  JSf.  The  second,  which  has  been  less  explored,  is  the  problem  of  efficient 
realization  of  a  moderate  probability  of  error  for  given  specifications.  Commimication 
theory  has  previously  focused  largely  on  the  problem  of  achieving  negligbly  small  proba¬ 
bilities  of  error,  but  the  existence  of  RS  codes  solves  this  problem  whenever  the  problem 

_3 

of  achieving  a  probability  of  error  less  than  10  ,  say,  can  be  solved.  This  last  prob¬ 

lem  is  probably  better  considered  from  the  point  of  view  of  modulation  theory  or  signal 
design  than  coding  theory,  whenever  the  former  techniques  can  be  applied  to  the  channel 
at  hand. 
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APPENDir  A 


Yariatjons  on  the  BCH  Decoding  Algorithm 


A.  1  ALTERNATIVE  DETERMINATION  OF  ERROR  VALDES 

The  point  of  view  wMch  led  ns  to  the  erasure  correction  procednre  of  section  4. 5 
leads  us  also  to  another  method  of  determininiS  the  values  of  the  errors.  Suppose  the 
number  of  errors  t  has  been  discovered;  then  the  t  X  t  matrix  M  has  rank  t  and  there¬ 
fore  nonzero  determinant.  Let  the  decoder  now  determine  the  locator  X.  of  any  error. 

If  we  were  to  guess  the  corresponding  error  value  e.  and  modify  the  T|  accordingly,  the 

guessed  word  would  still  re  either  t  or  (on  the  chance  of  a  correct  guess)  t- 1  errors; 

thus  the  t  X  t  matrix  formed  irom  the  new  T^  would  have  zero  determinant  if  and  only 

if  the  guess  were  correct.  In  general  one  would  expect  this  argument  to  yield  a  polyno- 

mi^  in  e.  of  degree  t  as  ibe  equation  of  condition,  but  because  of  the  special  form  of 
^o 

this  equation  is  only  of  first  degree,  and  an  e:iq>licit  formula  for  e.  can  be  obtained. 
In  symbols,  let  ° 

A  ^  ^ 

S*  =  S  —  c  X  • 

(m  +n+s,m  +n)  (m  +n-|-s,m  +n)  i  i-  (m  -fn-l-s.m  +n)' 

'o  o  o  o  to  to  o  o 

Then 


*^1  ‘^d  ^(m +n+s,  m +n)  “  *d  ®(m +n+s,m +n)”®i  ®^d  *  (m +n+s,m +n) 


o  o  •'o 

m  +n 


=  T,- 


, -e.  X.  °  0-,/X.  \  =  T«-E.  X”. 
^  )o  Jo  Joj  ^  Jo  Jo 


M|  = 


2t  -2 


2t.-3 


T 

^2t  -2 
o 

-  E.  X.  ° 

Jo  Jo 

T 

^2t  -3 
o 

-  E.  X.  ° 

Jo  Jo 

■  ■ ' 

2t  -3 

2t  -4 

T 

2t  -3 
0 

-  E.  X.  ° 

Jo  Jo 

*^21  -4 
o 

-E  X  °  - 

'  "  ’^2t  -t-2 

o 

2t  -t-1 

2t  -t-2 

T 

^2t  -t-1 
o 

-  E.  X.  ° 

Jo  Jo 

'^2t  -t-2 
o 

-  E.  X.  ° 

Jo  Jo 

'^2t  -2t 
o 

2t  -t-1 

-  E.  X.  ° 

Jq  Jq 

2t  -t-2 

-  E.  X.  ° 

Jq  Jq 


2t  -2t 
-  E.  X.  ° 

Jo  Jo 


Let  us  expand  this  determinant  into  2  determinants,  using  the  fact  that  the  deter¬ 
minant  of  the  matrix  which  has  the  vector  (a+b)  as  a  row  is  the  sum  of  the  determinants 
of  the  two  matrices  which  have  a  and  b  in  that  row,  respectively.  We  classify  the 

resulting  determinants  by  the  number  of  rows  which  have  E.  as  a  factor. 

Jo 

There  is  one  determinant  with  no  row  containing  E.  ,  which  is  simply  jM.  |. 

3r.  ^ 
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There  are  t  detenrtinants  vith  one  row  havinf  £.  as  a  factor.  For  example,  the 
first  is  ^ 


2t  -2 

2t  -3 

2t^-4- 

— E  X  ® 

3o  io 

— 

-E  X  ® 

J©  -*0 

... 

-t-2 

0 

^2t -1^2 

0 

— 

^2t -2t 

0 

There  are  ^2^  determinants  vith  two  rows  having  E.  as  a  factor.  The  first 


IS 


2t  -2 


-E.  X. 
Jr 


2t -3 


'W 


“o-* 


‘2t  -t-1 
o 


“-r? 

-E  X  ® 

^O  ^O 

2t  -4 
-E.  X.  ® 

3o  ^o 


2t -t-2 
o 


2t  -t-1 


-E.  X.  ‘ 
3o  ^o 


2t  -t-2 


-EX 

Jr 


2t -t-3 
o 


2t  -2t 
o 


But  in  this  determinant  the  first  row  is  simply  X.  times  the  second,  so  that  the  deter- 

^o- 

minant  is  zero.  Furthermore,  in  all  such  determinants  with  two  or  more  r>ws  having 

£.  as  a  factor,  these  rows  will  be  some  power  of  X.  times  each  other,,  so  that  all  such 
3o  3q 

.determinants  are  zero. 

The  t  determinants  with  one  row  having  E.  as  a  factor  are  all  linear  in  E.  ,  and 

^o  ^o 

contain  eiqilicit  powers  of  X.  between  2t  -2t  and  .Zt  -2;  their  siun  is  then 

Iq  °  ® 

2t  -2t 


■w  "(\) 


where  ^  is  ^  polynomial  of  degree  Zt  -  2,  whose  coefficients  are  functions  of  the 
original  T^.  „ 

Finally,  we  recall  that  E.  =  e.  X.  ^  and  that  |m’ j  =  0  if  and  only  if  e.  is 

^o  ^o  ^o  ^0/  ^o 

chosen  correctly,  from  which  we  get  the  equation  of  condition 


0=|M«|  = 


Zt  -Zt 


Ml  -  E  X 
Jq  "'o 


°  p{x. 


\ 


,) 


so 


90 


.ki. 


lA.!) 


can  egsOy  oe  osissE^  as  a  of  trse  raeSsc^oa  of  M.  Tbe  or^j  term  Is 

tbs  dsgoraigtgSor  of  (A.  1)  that  is  mi  readilj  calcalable  is  ?/X.  \ ,  la  gamraf ,  if  is 

4.  Wo/  ^ 

tbe  defernainaai  of  tiae  rrsstrir  remaining  after  tbe  2  rocx  aad  Br“  coligasn  arc  stnrcS: 
froai  M. ,  ilasa 

t 


'wiru”",!. 


A  simolili cafioa  oacors  saea  ve  are  ia  a  field  of  obaracteristic  tso.  For  ao4e  tfeai 


because  of  tbe  diagocal  sjjnsaaetry  of 


■^ik 


Any  ssai  £  A-,  ■arill  coasist 

tek=£  ^ 


en&irely  of  pairs  A^  -r  Aj^  =  0,  nnless  I  is  even,  vrben  tise  entire  Stan  equals  Ajj,  vrisere 
j  =  1/2.  Then 


t 


Evaluation  of  the  coefficients  of  P(X)  in  a  field  of  characteristic  two  therefore  involves 
calculating  t  (t-1)  X  (t-1)  determinants. 


A.  11  Example 

Let  the  decoder  have  solved  Eqs.  50  as  before,  obtaining  as  a  by-product  |m^|  =  c^. 
Trivially, 


■^2  ■  “^4  "  ° 


13 


Ajj  =  T2=0. 


14 


The  first  error  locator  that  it  will  discover  is  X j  =  a  ,  Then,  from  Eq,  A.  1 , 


|m,I 


e,  = 


1"  ,,  ,  X,  o  .  "  12,  13^  14^  10.  13 


=  c 


Similarly,  when  it  discovers  =  a 

6 


11 


®2  '  3,  7 


11.  10.  13 

alo+C'C  +a  )a 


=  a. 


Then  it  can  solve  for  dj  and  as  before. 


A.  12  Remarks 


The  procedure  just  described  for  determining  error  values  is  clearly  applicable  in 
principle  to  the  determination  of  erasure  values.  In  the  last  case,  however,  must  be 
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reaJzced  hj  ^  9s\,  tbe  TecSar  of  elerse=lsgy  syiszaefaic  fcsclfoi^s  of  Si»  s  —  1  erasures 

osiaer  ibe  ioe  being  ccssioered,  asad  Sbe  origical  raodlSed  cyclic  pari^  c±ec5s  Tj^ 
by  tbe  nsodliled  cy^c  parf^  cbecte  cSefined  ea  fbs  ofiEaer  s  -  1  erasntre  locafors.  This 
cseass  that  tbs  dejtemziisazis  zppesri^  izs  Sq.  A.  2.  as  well  as  >  cacsl  be  recorapded 
to  solve  for  eada  erastsre.  la  codrast  to  tbe  solntioa  for  tbe  error  valiaes,  tMs  proraises 
to  he  fedioas  aad  to  oHitste  against  tMs  naetbod  is  practice.  We  mestioa  tMs  possiMlitj 
oMy  becaisse  It  does  allow  calcolaifoa  of  tbe  correct  vabse  of  as  crrassre.  gives  os^  tbe 
Ecssber  d  errors  tbe  positiocs  of  tbe  other  erassres,  oitbod  bsowlet^  of  tbe  loca- 
tfos  or  vralse  tbe  errors*  a  capeMlify  wMcb  rsigbt  l^e  ssefsl  is  some  aimlicaifos. 

Tbe  erassre>correctioa  sM^me  witb  loo  errors  (sectios  4. 5)  cas  be  sees  to  be  a  spe¬ 
cial  case  d  tMs  dgoritbsr. 


After  we  bave  located  tbe  errors,  we  bave  tbe  opifos  of  solvb^  for  tbe  error  raises 
direct^  by  (A  1),  or  indirectly,  bry  treating  tbs  errors  as  erasures  asd  ssMg  Eq.  50. 

!ff  we  choose  tbe  former  method,  we  seed  tbe  i  (t-1)  X  (t-1)  determinants  A|.  of  (A  2). 

JU 

Is  general  this  recaires 


mubiplicaiions,  which  is  rapidfy  too  maigr  as  t  becomes  large.  There  is  a  method  of 
calculating  all  A»  at  once  which  seems  feasible  for  moderate  values  of  t.  We  asstime 
a  field  of  characteristic  two. 


Let  B  be  the  determinant  of  the  jXj  matrix  which  remains  when  all  the 

rows  and  columns  but  the  a,  ,  , _ , 


ImJ  =  B, 


1,2,... .t 


and 


_ _ _  a.,  are  struck  from  M,..  In  this  notation 


The  reader,  by  expanding  B  in  terms  of  the  minors  of  its  last  row  and  cancelling  those 
terms  which  because  of  symmetiy  appear  twice,  may  verify  the  fact  that 


a.  -Za.^a,  ,a_,  —  a.  ,  ^  *^Zt  -Za.+l®a, ,  a_,  —  a.  _ 

3  03  1  ^  3-1  o  3  1’  Z  3-Z 


^  "^Zt  -Za  +Z®a  a  a  a  ■*'••• 
o  3-3’  j-1 

The  use  of  this  recursion  relation  allows  calculation  of  all  with  multiplications 
(not  coimting  squares),  where,  for  small  t,  is  =  0  (see  section  A.  11),  =  3, 

=  15,  N5  =  38,  =  86,  =  17Z,  Ng  ==  333,  Ng  =  616. 

Once  the  A.,  are  obtained,  the  denominator  of  (A.  1)  can  be  expressed  as  a  single 
11  m 

polynomial  E(X)  by  st  multiplications;  E(X)  has  terms  in  X  ,  m^  +  Zt^  -Zt  ^  m  ^ 

m^  +  Zt  +  s,  or  a  total  of  Zt  +  s+  1  terms.  The  value  of  E(X)  can  therefore  be  obtained  for 


9Z 


•ir 


^  11 1  .'/P  ^ ' 


—1  31 

X=l,  p  _ ia  tom  by  tbe  Oues  meSbod  of  solvisg  for  the  roots  of  e^PCJ,  zod 

ia  fact  tisese  t»o  czlcolaticos  mzj  be  docs  sfmatiaaeoasly.  Wfceaever  gP”“  is  a  root  of 
f  (X),  -aeill  appear  as  tbe  csrrcnt  vaJnc  of  E{X)-  Sioce  |m,  |  will  have  been 

obtained  as  a  E^-prodsct  of  solving  for  v^{X).  an  Inversion  and  a  maltiplicatioa  will  give 

fbe  error  valce  correspo^dir^  to  X-  =  Otlser  n{s-r2t)  maMiplicaSioas  by  arc 

^o 

involved  bare,  gad  s-r2£  inensory  registers. 

In  order  to  conspare  tbe  alternative  taetbods  of  findci^  error  valses,  we  sisnply  cozn- 
pare  tbe  nsznber  taaMiplicatioas  needed  in  earn  case,  leavis^  aside  all  analysis  of  any 
other  eqnapnsezt  or  operaik^is  reeded  to  realize  eitber  algoritlnn.  We  recall  tbai  tbe 
valnes  of  s  erasnres  can  be  determined  with  approrSmately  2s{s-l)  maliiplications-  For 
dse  first  metbod,  we  need  approzimale^  mnitiplicaiions  to  find  tbe  error  valnes,  and 

tL 

and  Zs[s-1)  to  Sad  tbe  crasares;  for  tbe  second,  2(s^t)(srt-l)  to  Snd  both  tbe  erasures 
and  tbe  errors.  Using  tbe  valaes  of  given  earSer,  we  Snd  that  tbe  former  metbod 
reqsires  fewer  maltipScaSons  wben  t  <  7.  which  suggests  that  S  ooghi  to  be  considered 
whenever  tbe  minimnm  distance  of  tbe  code  is  15  or  less. 


A.  2  ALTERNATiyE  DETERMEC-ATION  OF  ERROR  LOC.4TIONS 

Cosiinned  development  of  tbe  point  of  view  expressed  above  gives  us  an  alternative 
method  of  locating  tbe  errors.  If  we  tentatively  consider  a  received  symbol  as  an  era¬ 
sure,  in  a  received  word  with  t  errors,  then  the  resulting  word  has  t  errors  S  the  trial 
^Tubol  was  in  error.  Tbe  vanishing  of  the  I  X  t  determinant  M"  formed  from  the 
deSned  now  by  s  +  1  erasure  locators  then  indicates  the  error  locations.  The  reader 

may  veruy  the  fact  that  S  X.  is  the  locator  of  the  trial  symbol, 

^o 

T*B  —  —  Y  'T 

and 

Tm  ■v«T*  np  _'V'T*  'T*  _'y''p 

j  ^2t  -t-1 
o 


M«  = 


^2t  -1  ^2t  -2 

o  ■'o  o 

T 

^2t  -2 

^2t  -3 
■’o  o 

T  - 

2t  -t 

o 

o 

P  _  Y  T 

T 

^2t  -3 

-XT  ... 

o 

’^2t  -t-1  ■ 

o 

o 

2t  -t  "  ^2t  -t-1 

o  •'o  o 

^2t  -t-1 
o 

“  ^2t  -t-2  ' ' ' 

^o  o 

'^Zt  -2t+l 
o 

j  2t  -t-2 
^o  o 


''o  o 


If  we  expand  Im^'|  by  columns,  many  of  the  resulting  determinants  will  have  one  column 

equal  to  -X .  times  another.  The  only  ones  that  will  not  will  be 
^o 


®o  "  1 , 2t^-t) ’  .  2t^-t- 1 ) ’ "  •  ’  "^(Zt^-t .  Zt^-2t+ 1 ) 


^  ^  m 
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zssA  so  fcrtb.  Tims  if  X.  is  a  rost  of  trse  poljmomial 

A 

|m-|  is  zero  zssd  X.  is  an  error  locator.  It  can  he  checked  hrjr  fto  expansion  of  D.  into 
three  matrices,  as  'eras  done  earlier  in  the  proof  that  the  rank  of  M  is  t,  that 


SO  that 


D(X)  =  D^cr^(X). 


and  this  method  is  entirely  equivalent  to  the  former  one.  Furthermore,  it  is  clear  t’jat 


D{X)  = 


X 


1 


'^2t  -1 
o 

T 

2t  ~Z 
o 

— 

T 

2t  -t 
o 

T 

^2t  -2 
o 

T 

— 

T 

^2t  -t-1 
o 

T 

^2t  -t 
o 

'^2t  -t-1 
o 

... 

'^2t  -2t+l 
o 

'^Zt  -t-1 
o 

T 

2t  -t-2 
o 

— 

'V 

‘2t  -2t 
o 

The  condition  of  the  vanishing  of  this  matrix  determinant  is  the  generalization  to  the  non- 

31 

binary  case  of  the  'direct  method'  of  Chien.  It  apijears  to  offer  no  advantages  in  prac¬ 
tice,  for  to  get  the  coefficients  of  DpC)  one  must  find  the  determinants  of  t+1  t  X  t 
matrices,  whereas  the  coefficients  of  the  equivalent  o-g(X)  can  be  obtained  as  a  by-product 
of  the  determination  of  t. 


94 


- T"  ^7 

I 


APPENDIX  B 


Fcrmolas  for  Compatatioa 


We  shall  now  derive  and  discuss  the  formulas  used  for  the  cmnpatations  of  Secticm  V. 
B.  1  OUTER  DECODER 

Let  us  consider  first  the  prohabiliij  of  the  outer  decoder  decoding  incorrectly,  or 
failing  to  decode.  We  shall  let  be  the  prcbabilii^  that  smy  qfmbol  is  in  error,  and 
p^  be  the  probability  that  it  is  erased. 

If  the  outer  decoder  does  errors-only  decoding,  p^  =  0.  Let  the  maximum  correct¬ 
able  number  of  errors  be  t^;  then  the  probability  of  decoding  error  is  the  probability  of 

t  +  1  or  more  symbol  errors: 
o 

n 

Pr(e)=  ^  ©  PeW-Pe*”"*-  «»- *> 

t=t  +1 
o 

If  the  outer  decoder  does  dcdetions-and-errors  decoding,  the  Tninimnm  distance  is 
d,  and  the  maximum  number  of  errors  corrected  is  t^,  then  the  probability  of  decoding 
error  is  the  probability  that  the  number  of  errors  t  and  the  munber  of  ddetions  s  sat¬ 
isfy  2t+s^d  or  t^t  +1: 

*•  o 


-  1  CIO 


n-s-t 


t,s 


2t+s^dort>t  +1 
o 


t_3. 


o  n 

=  Z  Z  Cro  Vd^'-Pe-Pd’ 

t=0  S=d-2t 


n-s-t 


i  (”) 


iU-t 


(B.2) 


t=t  +1 
o 


.Equation  B.  2  is  also  valid  for  modified  deletions-and-errors  decoding,  when  is  the 
reduced  maximum  correctable  number  of  errors. 

For  fixed  t,  we  can  lower-bound  an  expression  of  the  form 


by 


t  s,,  .n-s-t 

PePd^^’Pe'Pd^ 


t  S-,  .n-s-t 

PePd<^"Pe"Pd^ 


(B.3) 


{B.4) 
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* 


* 


To  iipperboimd  (B.  3),  ve  write  it  as 


*2  A 

S=tj  s=t^l 

sf.  fh 

Since  the  ratio  of  the  (sfl)  to  the  s  term  in  the  latter  series  is 
(n-s-t)Pj  {n^-t^)?^ 

(»H)(1-Pe-Pd)  ^  V^-Pe-Pd^  " 

Eq.  B,.5  can  be  uj^riKXinded  bj 


(s"0  PlPd'^-Pe-Pd’ 


n-s-t 


(B.5) 


s‘^0 


^  C?0pM‘*-l>e-Pd>“‘^'*-T^C/l,0ptP^*’<'-5^e-Pd>”'^^^^^^  <=•« 


By  choosing  t^  large  enough,  the  lower  and  upper  bounds  of  Eqs.  B.  4  and  B.  6  may  be 
made  as  close  as  desired.  Ln  the  pr(^ram  of  Section  V,  we  let  t^  be  large  enough  so 
that  the  bounds  were  within  1  per  cent  of  each  other.  Both  (B.  i)  and  (B.  2)  can  then  be 
upperbounded  and  approximated  by  (B.  6). 


B.  2  INNER  DECODER 


If  the  outer  decoder  is  set  to  do  errors -only  decoding,  the  inner  decoder  corrects 
as  many  errors  as  it  can  (t^).  Whenever  the  actual  number  of  errors  exceed  t^,  the 
inner  decoder  will  either  fail  to  decode  or  decode  in  error,  but  either  of  these  events 
constitutes  a  symbol  error  to  the  outer  decoder.  If  the  probability  of  symbol  error  for 
the  inner  decoder  is  p  ,  then 


t=t +1 
o 


Po<l-Po>""'* 


(B.  7) 


Equation  B.  7  can  be  upperbounded  and  approximated  by  Eq.  A.  6. 

If  the  outer  decoder  is  set  for  deletions-and-errors  decoding,  the  inner  decoder  is 
set  to  correct  whenever  there  are  apparently  t^  or  fewer  errors,  where  t^  <t^;  other¬ 
wise  it  signals  a  deletion.  If  there  are  more  than  t^  actual  errors,  the  decoder  will 
either  delete  or  decode  incorrectly,  so  that 
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Pe+Pd=  E  (D  Po‘*'Po>“’‘- 


t=tj+l 


Ordinarily  is  set  so  that  «  p^,  so  that  p^  is  upperbounded  and  approadmated  by 


Pd^  I  (Dp>-Po>“'‘- 


{B.8) 


t=tj+l 


which  in  turn  is  upperbounded  and  approximated  £q.  A.  6. 

Estimating  p^  turns  out  to  be  a  knottier  problem.  Of  course,  if  the  minimum  dis¬ 
tance  of  the  inner  code  is  d,  no  error  can  occur  unlesS'the  number  of  symbol  errors 
is  at  least  d-t,  so  that 


P  ^ 


t  ©p^<-p>”-‘ 


t=d-t. 


This  is  a  valid  upper  bound  but  a  very  weak  estimate  of  p^,  since  in  general  many  fewer 


than  the  total  of  t-errpr  patterns  will  cause  errors;  most  will  cause  deletions,  A 

tighter  bound  for  p^  depends,  however,  on  knowledge  of  the  distribution  of  weights  in 

the  inner-^code,  which  is  in  general  difficult  to  calculate. 

We  can  get  a  weak  bound  bn  the  number  N  of  code  words  of  weigU  w  in  any  code 

on  GF(q)  of  length  n  and  minimum  distance  d  as  follows.  Let  t^  be  the  greatest  integer 

such  that  2t  <d.  The  total  number  of  code  words  of  weight  w-t  distance  t  from  a  code 
o  /  w  \  ^  o  o 

word  of  weight  w  is  U  1,  siiice  to  get  such  a  word  we  may  change  any  t^  of  the  w  non¬ 
zero  symbols  in  the  word  to  zeros.  The  total  number  of  words  of  weight  w-t^  distance 
t^  from  all  code  words  of  weight  w  is  then 


(-:)»■ 


and  all  of  these  are  distinct,  since  no  word  can  be  distance  t  from  two  different  code 

o 

words.  Blit  this  number  cannot  exceed  the  total  number  of  words  of  weight  w-t^: 


(A)»- 


Therefore 


(q-1) 

w!  (n-w-t^) ! 


(B.  9) 


Now  a  decoding  error  will  occur,  when  the  inner  code  is  linear,  when  the  error 
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pattern  is  distance  tj  or-less  from  some  code  word.  The  total  number  of  words  dis¬ 
tance  k  from  some  code  word  of  weight  w  is 

E  i  +  j  +  f  =  k 

iij.i 

since  all  code  words  can  be  obtained  by  changing  any  £  of  the  n-w  zeros  to  any  of  the 
(q-1)  nonzero  elements,  any  i  of  the  w  nonzero  elements  to  any  of  the  other  (q-2)  non¬ 
zero  elements,  and  any  j  of  the  remaining  nonzero  elements  to  zeros,  where  i+ j  + 1  =  k. 
The  wei^it  of  the  resulting  word  for  a  particular  i,3,  l.will  be  w+ 1  -  j,  so  that  the  prob¬ 
ability  of  getting  a  word  distance  k  from  a  particular  code  word  of  weight  w  is 


I 


ij.i 

i+j+£=k 

Summing  over  all  words  of  aU  weights  w^d  and  all-k^tj,  and  substituting  j  =  k— i  — 12^0, 
we  obtain 


..-ii  itv 

w=d  k=0  i=0  £=0 


,  ,i-w+k-i-£.  .i  w+2£+i-k„  .n-w-2£-i+k 

{n-w)l  wl  (q-1)  (q-2)  p^  Po^ 

£!  (u-w-1)  I  il  (k-i-i) !  (w-k-£) ! 


Interchanging  sums,  substituting  the  upper  bound  of  (B.  9)  for  N^,  and  writing  the, ranges 
of  w,k,i  and  1  more  suggestively,  we  have 

IX  1/  11/  .  _.i  w+2£+i-k  „  ,n-w-2£-i+k 

V  V  n!t^!  (n-w)!  (q-1)  (q-Z)  p^  ^^''Pq' 

^e  Z/  L  L  L  /n-w-#H  i!  fk-f-B!  fw-k-f! I  fn-w+t  ) ! 


(q-2)"  p’ 


(l-Po) 


n-w-2£-i+k 


k<tj  i^O  j^O  w^d 


£1  (n-w-£)l  il  (k-£-i)  I  (w-k-£)  I  (n-w+t^) ! 


We  now  show  that  the  dominant  term  in  this  expression  is  that  specified  by  k=tj,i=0i  1=0, 
and  w  =  d,  and  inifact  that  the  whole  series  is  boimded  by 


Pe  ^  ^1^2^3^4 


t|-t  d-t,  n-d+t, 

°Po  Vd-Pp) 

tj!(d-tj)!(n-d+yi 


(B.  10) 


where 


C,  s 


p  n  -  d  +  t^ 
_ o 


1  "  i  -  a^*  ^1  "  1  -  Po  **  h 


C.  s 


2  =  l-a2'  ^2  (l-p J  q-1  d-tj+1 


(n-d)t. 
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c  a  =-f2_ll£t 

“3  -1 -p^q-'l  *1 


C  -  ■ 

4-1 -a  ' 


-4  “  1  -p  a  -  1  d  - 1,  -  1* 


imd  it  is  assumed  that  the  constantS  'a_  are  less  than  one.  This  result  follows  from 

m 

repeated  bounding  of  the  series  by -the  first  term,  times  a  series  of  the  form 


I 

n^ 


a"  =  ^ 


m 


For  example,  the  ratio  of  the  (w+1)^*  to  the  term, is 


c  n  -  w  -  i 


n  -  w  - 1, 


1-Pj^  n-w  wrk+£+l  1 
since  w>d,  k<tj,  1^0. 

The  ratio  of  the  (1+1)®^  term  to  the  1^^  term  is 


1  n  -  w  -  £  .  k  -  £  -  i 


<  a-; 


-PqI  q-l  .£+i  w-k+£+l  '“2 

of  the  (i+1)®^  to  the  i^^  : 


1  -  p„  q  -  1  i  +  1  ^3’ 

o 

and  of  the  (k~l)^^  to  the  k^^; 


1  k  -  £  ~  i 


<  a^. 


1  “  Pq  q  “  1  w  -  k  +  £  +  1  4* 


The  boimd  on  Pg  'of  Eq.  B.  10  is  a  valid  upper  bound,  but  not  a  good  approximation, 
since  (B.  9)  is  a  weak  bound  for  N  .  A  tighter  bound  would  follow  from  better  knowledge 

Tl 

of  N  .  In  Table  5  we  use  the  actual  values  of  N  for  RS  codes,,  which  inarkedly  affects 
the  character  of  our  results. 


B.  3  MODULATION  ON  A  GAUSSIAN  CHANNEL 

R 

We  contemplate  sending  one  of  M  =  2  ”  biorthogonal  signals  over  an  infinite  band- 
width  additive  white  Gaussian  noise  channel.  A  well-known  model  for  such  a  trans¬ 
mission  is  this.  The  M  signals  are  represented  by  the  M  (M/2) -dimensional  vectors 

X.,  1  i  ^  M/E  or  -1  >  i  >  -M/2,  which  are  the  vectors  with  zeros  in  all  places  but  the 
^  itll 

jip  ,  and  in  that  place  have  ±L  according  to  whether  i  =  ±|i|.  (These  vectors  corre¬ 
spond  to  what  would  be  observed  at  the  outputs  of  the  bank  of  M/2  matched  filters  if  the 
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*  ’  %  4  " 


'  . . .  • 

/' 


waveforms  that  they  represent,uncorrupted  by  noise,  were  the  input.) 

The  actual,  noi^  outputs  of  the  bank  of  matched  filters  are  represented  by  the  {M/2)- 
dimensional  vector  y  =  (yj.y^#  • .  •  f  ^  assume  a  nome  energy  per  dimension 

ofNi  then 


= - exp 


X'^3  13 
2N 


Inteipreting 


M/2 

Lj  13' 


as  the  Euclidean  distance  between  the  vectors  y  and  x.,  we  see  that  the  maximum  - 
likelihood  decision  rule  is  to  choose  that  input  closest  in  Euclidean  distance  to  the 
received  signal. 

The  case  M=4  is  illustrated  in  Fig.  B-1,  where  we  have  drawn  in  the  lines  marking 
the  boundaries  of  the  decision  regions.  There  is  perfect  symmetry  between  the  four 
inputs.  If  one  of  them,  say  {L,0),  is  selected,  the  probability  of  error  is  the  .probability 
that  the  received  signal  wul  lie  outside  the  decision  region  that  contains  (L,  0).  If  we 
let  Ej  be  the  event  that  the  received  signal  falls  on  the  other  side  of  the  line  AB  from 
(L,  0),  and  Ej,  that  it  falls  on  the  other  side  of-CD,  then  it  can  readily  be  shown  by  a  45* 
coordinate  rotation  that  E^  and  are  independent,  and  that  each  has  probability 


^  r  e-yVzN 
^21^ 


=-L-  r  e 

*/2v 


-Z^fz 


The  probability  ttat  neither  occurs  is  (1-p)'^,  so  that  the  probability  that  at  least  one 
occurs,  which  is  the  probability  of  error,  is 

q  =  2p  -  p^. 

WhenM>4,  the  symmetry  between  the  inputs  stiU  obtains,  so  let  us  suppose  the 
transmission  of 


Xj  =  (I/,0,...  ,0). 

Let  Ej,  2<j^M/2  be  defined  as  the  event  in  which  the  received  signal  is  closer 
to  one  of  the  three  vectors  Xj,  x_j,  than  to  Xj.  Then  the  event  e  of  an  error  is  the 
union  of  these  events 
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M/2 

€  =  U  E.. 
j=2  J 

But  the  probability  of  any  one  of  these  events  is  q.  Thus,  by  the  tuuon  bound, 

M/2 

=  Pr(c)  «  ^  Pr(E.)  =  ^-l)  q.  (B.  11) 

j=2 

When  the  signal-to-noise  ratio  h  /N  is  large,  the  bound  of  Eqs.  B.  7-B.  9  becomes 

40  41 

quite  tight.  To  calculate  4,  we  use  an  approximation  of  Hastings.  Viterbi  has  cal» 
culated  the  exact  value  of  p  for  3  10;  we  have  fitted  curves  to  his  data  in  the  low 

signal-tq -noise  range,  and  used  the  bound  above  elsewhere,  so  that  over  the  whole  range 
p  is  jgiven  correctly  within  one  per  cent.  When  >  1 1 ,  the  union  bound  is  used  for  all 


signal-to-noise  ratios. 

Finally,  we  have  the  problem  of  bounding  the  deletion  and  error  probabilities,  when 
the  detector  deletes  whenever  the  magnitude  of  the  output  of  some  matched  filter  its  not 
at  least  D  greater  than  that  of  any  other.  Figure  B-2  illustrates  the  decision  and  dele¬ 
tion  regions,  again  for  M  =  4.  It  is  clear  that  the  probability  of  not  decoding  correctly 
is  computed  exactly  as  before,  with  L  replaced  by  L-D;  this  probability  overbounds 
and  approximates  the  deletion  probability.  The  probability  of  error  is  overbounded,  not 
tightly,  by  the  probability  of  falling  outside  the  shaded  line  DEF,  which  probability  is 
computed  as  before  with  L  replaced  by  L+D, 

When  M>4,  the  union  bound  arguments  presented  above  are  still  valid,  again  with  L 
replaced  by  L-D  for  deletion  probability  and  by  L+D  for  error  probability. 

The  case  in  which  M  =  2  is  trivial. 
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